From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38501) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3lmK-0004AR-9q for qemu-devel@nongnu.org; Tue, 01 Dec 2015 09:19:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a3lmF-0001JU-8E for qemu-devel@nongnu.org; Tue, 01 Dec 2015 09:19:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34866) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3lmF-0001JH-1J for qemu-devel@nongnu.org; Tue, 01 Dec 2015 09:19:31 -0500 Date: Tue, 1 Dec 2015 16:19:27 +0200 From: "Michael S. Tsirkin" Message-ID: <20151201161445-mutt-send-email-mst@redhat.com> References: <20151130105044.12269.21261.stgit@bahia.huguette.org> <20151130150353-mutt-send-email-mst@redhat.com> <20151130144631.4736280b@bahia.local> <20151130185328-mutt-send-email-mst@redhat.com> <878u5eqw2w.fsf@linux.vnet.ibm.com> <20151201125659-mutt-send-email-mst@redhat.com> <20151201143119.42af4ae1@bahia.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151201143119.42af4ae1@bahia.local> Subject: Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Greg Kurz Cc: Paolo Bonzini , "Aneesh Kumar K.V" , qemu-devel@nongnu.org On Tue, Dec 01, 2015 at 02:31:19PM +0100, Greg Kurz wrote: > On Tue, 1 Dec 2015 12:57:47 +0200 > "Michael S. Tsirkin" wrote: > > > On Tue, Dec 01, 2015 at 04:23:11PM +0530, Aneesh Kumar K.V wrote: > > > "Michael S. Tsirkin" writes: > > > > > > > On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote: > > > >> On Mon, 30 Nov 2015 15:06:33 +0200 > > > >> "Michael S. Tsirkin" wrote: > > > >> > > > > > > > > > .... > > > >> > > > >> On ppc64, the address space is divided in 256MB-sized segments where all pages > > > >> have the same size. This is a hw limitation IIUC. I don't know if it can be > > > >> fixed and I'll let Ben comment on it. > > > > > > > > But it's anonymous memory with PROT_NONE. There should be no pages there: > > > > just a chunk of virtual memory reserved. > > > > > > > > > > ppc64 use page size (called as base page size) to find the hash slot in > > > which we find the virtual address to real address translation. All the > > > pages in a segment should have same base page size. Hugetlb pages have a > > > base page size of 16M whereas a regular linux page have 64K. mmap will > > > fail to map a hugetlb mapping in a segment that already have regular > > > pages mapped. > > > > > > -aneesh > > > > > > I see this in kernel: > > > > } else if (flags & MAP_HUGETLB) { > > struct user_struct *user = NULL; > > struct hstate *hs; > > > > hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK); > > if (!hs) > > return -EINVAL; > > > > len = ALIGN(len, huge_page_size(hs)); > > /* > > * VM_NORESERVE is used because the reservations will be > > * taken when vm_ops->mmap() is called > > * A dummy user value is used because we are not locking > > * memory so no accounting is necessary > > */ > > file = hugetlb_file_setup(HUGETLB_ANON_FILE, len, > > VM_NORESERVE, > > &user, HUGETLB_ANONHUGE_INODE, > > (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); > > if (IS_ERR(file)) > > return PTR_ERR(file); > > } > > > > So maybe it's a question of passing in MAP_HUGETLB and the > > correct size mask. > > > > I guess you are talking about the PROT_NONE mapping here ^^. Yes. > How do we know that the fd points to hugepages ? Donnu ... I guess we can just try this if the regular mmap fails? > And what's the difference between passing MAP_HUGETLB and passing a > hugetlbfs backed fd + MAP_NORESERVE ? Does MAP_NORESERVE have the desired effect? I need to look at the kernel code, man page merely mentions swap space use. > I think the latter is easier > because we don't need to guess if backend is hugetlbfs. If this helps, that's fine by me. It's probably a good idea to set this anyway. -- MST