linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: vmalloc improvements
@ 2001-02-24  0:26 Reto Baettig
  2001-02-24  0:32 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Reto Baettig @ 2001-02-24  0:26 UTC (permalink / raw)
  To: MM Linux, Kernel Linux; +Cc: Martin Frey, baettig

Hi

We have an application that makes extensive use of vmalloc (we need
lots of large virtual contiguous buffers. The buffers don't have to be
physically contiguous).

vmalloc/vfree is very slow when the vmlist gets long.

I don't know if this problem is already on a todo list or if we are the
first ones who want to use vmalloc extensively. Maybe We're also missing
something.

We would volounteer to improve vmalloc if there is any chance of
getting it into the main kernel tree. We also have an idea how we
Could do that (quite similar to the process address space management):

1.      Create a generic avl-tree headerfile (similar to list.h)

2.      We change the vm_struct to something like:

struct vm_struct {
        unsigned long flags;
        void * addr;
        unsigned long size;
        struct avl_entry avl;
        struct list_head empty_list;
        struct list_head vm_list;
};

with struct avl_entry:

struct avl_entry {
        unsigned long key;
        short height;
        struct avl_entry * avl_left;
        struct avl_entry * avl_right;
}

3.      We have a avl-tree (vm_avl_used) for the used memory areas (sorted
by the address), a hashtable for the unused memory areas (vm_hash_unused,
hashed by the size) and a sorted linear list (vm_list) of all the memory
areas (used and unused). The vm_hash_unused hashtable is initially empty
and gets only filled when previously used areas are freed and the memory
space gets segmented.

4.      When we free an area, we first find it in the avl tree. After we
have the vm_struct, we can look in the vm_list if there are any direct
neighbours. If yes and the neighbour is also free, the areas get merged.

5.      When we have to allocate a new area (get_free_area)
and the hash table can not satisfy the request, we allocate a new area
starting after the end of the used memory areas.

Is this something that makes sense to do and that could make it
into the 2.4 or the 2.5 kernel?

Reto


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: vmalloc improvements
  2001-02-24  0:26 RFC: vmalloc improvements Reto Baettig
@ 2001-02-24  0:32 ` Ingo Molnar
  2001-02-27  0:50   ` Reto Baettig
  2001-02-27  0:56   ` David S. Miller
  2001-02-24  1:01 ` Linus Torvalds
  2001-02-24  1:09 ` Alan Cox
  2 siblings, 2 replies; 6+ messages in thread
From: Ingo Molnar @ 2001-02-24  0:32 UTC (permalink / raw)
  To: Reto Baettig; +Cc: MM Linux, Kernel Linux, Martin Frey


On Fri, 23 Feb 2001, Reto Baettig wrote:

> We have an application that makes extensive use of vmalloc (we need
> lots of large virtual contiguous buffers. The buffers don't have to be
> physically contiguous).

question: what is this application, and why does it need so much virtual
memory? vmalloc()-able memory is maximized to 128 MB right now, and
increasing it conflicts with directly mapping RAM, so generally it's a
good idea to avoid vmalloc() as much as possible.

	Ingo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: vmalloc improvements
  2001-02-24  0:26 RFC: vmalloc improvements Reto Baettig
  2001-02-24  0:32 ` Ingo Molnar
@ 2001-02-24  1:01 ` Linus Torvalds
  2001-02-24  1:09 ` Alan Cox
  2 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2001-02-24  1:01 UTC (permalink / raw)
  To: linux-kernel

In article <200102240026.QAA09446@k2.llnl.gov>,
Reto Baettig  <baettig@k2.llnl.gov> wrote:
>
>We would volounteer to improve vmalloc if there is any chance of
>getting it into the main kernel tree. We also have an idea how we
>Could do that (quite similar to the process address space management):
>
>1.      Create a generic avl-tree headerfile (similar to list.h)
....

No thanks.

Just use the process address space management as-is, and make the
vmalloc address list be the same as any other address list: it would just
be the "native" address list for "init_mm".

You could probably even use "insert_vm_struct()" directly, and have that
do the AVL tree stuff for you, no changes needed.

>Is this something that makes sense to do and that could make it
>into the 2.4 or the 2.5 kernel?

It's definitely not a 2.4.x thing.

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: vmalloc improvements
  2001-02-24  0:26 RFC: vmalloc improvements Reto Baettig
  2001-02-24  0:32 ` Ingo Molnar
  2001-02-24  1:01 ` Linus Torvalds
@ 2001-02-24  1:09 ` Alan Cox
  2 siblings, 0 replies; 6+ messages in thread
From: Alan Cox @ 2001-02-24  1:09 UTC (permalink / raw)
  To: baettig; +Cc: MM Linux, Kernel Linux, Martin Frey

> We have an application that makes extensive use of vmalloc (we need
> lots of large virtual contiguous buffers. The buffers don't have to be
> physically contiguous).

So you could actually code around that. If you have them virtually contiguous
for mmap for example then you can actually mmap arbitary page arrays

> We would volounteer to improve vmalloc if there is any chance of
> getting it into the main kernel tree. We also have an idea how we
> Could do that (quite similar to the process address space management):

Im not the one to call the shots, but it seems if you need an AVL for the
vmalloc tables then vmalloc is possibly being overused, or people are not
allocating buffers just occasionally as anticipated

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: vmalloc improvements
  2001-02-24  0:32 ` Ingo Molnar
@ 2001-02-27  0:50   ` Reto Baettig
  2001-02-27  0:56   ` David S. Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Reto Baettig @ 2001-02-27  0:50 UTC (permalink / raw)
  To: mingo; +Cc: MM Linux, Kernel Linux, Martin Frey

Ingo Molnar wrote:
> question: what is this application, and why does it need so much virtual
> memory? vmalloc()-able memory is maximized to 128 MB right now, and
> increasing it conflicts with directly mapping RAM, so generally it's a
> good idea to avoid vmalloc() as much as possible.

We implemented a RPC mechanism over a fast network in the kernel. The
end application is a distributed filesystem. The RPC server needs lots
of 2MB receive buffers which are allocated using vmalloc because the NIC
has its own pagetables.
The buffers then get handed to the consumer (lots of threads) which
eventually frees them. This way, we have a performance on the RPC layer
of 200MBytes/s.

The 128MB limit is probably an Intel limitation since we don't see it on
our Alpha Machines (Linux 2.2.18 Alpha SMP)

Reto

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: vmalloc improvements
  2001-02-24  0:32 ` Ingo Molnar
  2001-02-27  0:50   ` Reto Baettig
@ 2001-02-27  0:56   ` David S. Miller
  1 sibling, 0 replies; 6+ messages in thread
From: David S. Miller @ 2001-02-27  0:56 UTC (permalink / raw)
  To: Reto Baettig; +Cc: mingo, MM Linux, Kernel Linux, Martin Frey


Reto Baettig writes:
 > The RPC server needs lots of 2MB receive buffers which are
 > allocated using vmalloc because the NIC has its own pagetables.

Why not just allocate the page seperately and keep track of
where they are, since the NIC has all the page tabling facilities
on it's end, the cpu side is just a software issue.  You can keep
an array of pages how ever large you need to keep track of that.

vmalloc() was never meant to be used on this level and doing
so is asking for trouble (it's also deadly expensive on SMP due
to the cross-cpu tlb invalidates using vmalloc() causes).

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-02-27  1:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-24  0:26 RFC: vmalloc improvements Reto Baettig
2001-02-24  0:32 ` Ingo Molnar
2001-02-27  0:50   ` Reto Baettig
2001-02-27  0:56   ` David S. Miller
2001-02-24  1:01 ` Linus Torvalds
2001-02-24  1:09 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).