All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: bpf@vger.kernel.org
Cc: daniel@iogearbox.net, andrii@kernel.org,
	torvalds@linux-foundation.org, brho@google.com,
	hannes@cmpxchg.org, lstoakes@gmail.com,
	akpm@linux-foundation.org, urezki@gmail.com, hch@infradead.org,
	linux-mm@kvack.org, kernel-team@fb.com
Subject: [PATCH bpf-next] mm: Introduce vm_area_[un]map_pages().
Date: Tue, 20 Feb 2024 11:26:13 -0800	[thread overview]
Message-ID: <20240220192613.8840-1-alexei.starovoitov@gmail.com> (raw)

From: Alexei Starovoitov <ast@kernel.org>

vmap() API is used to map a set of pages into contiguous kernel virtual space.

BPF would like to extend the vmap API to implement a lazily-populated
contiguous kernel virtual space which size and start address is fixed early.

The vmap API has functions to request and release areas of kernel address space:
get_vm_area() and free_vm_area().

Introduce vm_area_map_pages(area, start_addr, count, pages)
to map a set of pages within a given area.
It has the same sanity checks as vmap() does.
In addition it also checks that get_vm_area() was created with VM_MAP flag
(as all users of vmap() should be doing).

Also add vm_area_unmap_pages() that is a safer alternative to
existing vunmap_range() api.

The next commits will introduce bpf_arena which is a sparsely populated shared
memory region between bpf program and user space process. It will map
privately-managed pages into an existing vm area with the following steps:

  area = get_vm_area(area_size, VM_MAP | VM_USERMAP); // at bpf prog verification time
  vm_area_map_pages(area, kaddr, 1, page);            // on demand
  vm_area_unmap_pages(area, kaddr, 1);
  free_vm_area(area);                                 // after bpf prog is unloaded

For BPF use case the area_size will be 4Gbyte plus 64Kbyte of guard pages and
area->addr known and fixed at the program verification time.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/vmalloc.h |  3 +++
 mm/vmalloc.c            | 46 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..7d112cc5f2a3 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -232,6 +232,9 @@ static inline bool is_vm_area_hugepages(const void *addr)
 }
 
 #ifdef CONFIG_MMU
+int vm_area_map_pages(struct vm_struct *area, unsigned long addr, unsigned int count,
+		      struct page **pages);
+int vm_area_unmap_pages(struct vm_struct *area, unsigned long addr, unsigned int count);
 void vunmap_range(unsigned long addr, unsigned long end);
 static inline void set_vm_flush_reset_perms(void *addr)
 {
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..d6337d46f1d8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -635,6 +635,52 @@ static int vmap_pages_range(unsigned long addr, unsigned long end,
 	return err;
 }
 
+/**
+ * vm_area_map_pages - map pages inside given vm_area
+ * @area: vm_area
+ * @addr: start address inside vm_area
+ * @count: number of pages
+ * @pages: pages to map (always PAGE_SIZE pages)
+ */
+int vm_area_map_pages(struct vm_struct *area, unsigned long addr, unsigned int count,
+		      struct page **pages)
+{
+	unsigned long size = ((unsigned long)count) * PAGE_SIZE;
+	unsigned long end = addr + size;
+
+	might_sleep();
+	if (WARN_ON_ONCE(area->flags & VM_FLUSH_RESET_PERMS))
+		return -EINVAL;
+	if (WARN_ON_ONCE(area->flags & VM_NO_GUARD))
+		return -EINVAL;
+	if (WARN_ON_ONCE(!(area->flags & VM_MAP)))
+		return -EINVAL;
+	if (count > totalram_pages())
+		return -E2BIG;
+	if (addr < (unsigned long)area->addr || (void *)end > area->addr + area->size)
+		return -ERANGE;
+
+	return vmap_pages_range(addr, end, PAGE_KERNEL, pages, PAGE_SHIFT);
+}
+
+/**
+ * vm_area_unmap_pages - unmap pages inside given vm_area
+ * @area: vm_area
+ * @addr: start address inside vm_area
+ * @count: number of pages to unmap
+ */
+int vm_area_unmap_pages(struct vm_struct *area, unsigned long addr, unsigned int count)
+{
+	unsigned long size = ((unsigned long)count) * PAGE_SIZE;
+	unsigned long end = addr + size;
+
+	if (addr < (unsigned long)area->addr || (void *)end > area->addr + area->size)
+		return -ERANGE;
+
+	vunmap_range(addr, end);
+	return 0;
+}
+
 int is_vmalloc_or_module_addr(const void *x)
 {
 	/*
-- 
2.34.1


             reply	other threads:[~2024-02-20 19:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-20 19:26 Alexei Starovoitov [this message]
2024-02-21  5:52 ` [PATCH bpf-next] mm: Introduce vm_area_[un]map_pages() Christoph Hellwig
2024-02-21 19:05   ` Alexei Starovoitov
2024-02-22 23:25     ` Alexei Starovoitov
2024-02-24  0:00       ` Alexei Starovoitov
2024-02-23 17:14     ` Christoph Hellwig
2024-02-23 17:27       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240220192613.8840-1-alexei.starovoitov@gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.