All of lore.kernel.org
 help / color / mirror / Atom feed
* Umem Questions
@ 2021-05-14 14:34 Dan Siemon
  2021-05-17  8:21 ` Magnus Karlsson
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Siemon @ 2021-05-14 14:34 UTC (permalink / raw)
  To: Xdp

I've been trying to work with large Umem areas and have a few questions
. I'd appreciate any help or pointers. If it makes any difference, my
AF_XDP testing is with i40e.

1) I use kernel args to reserve huge pages on boot. The application
mmap call with the huge TLB flag appears to use huge pages as I can see
the count of used huge pages go up (/proc/meminfo). However, the number
of pages used by the umem, as shown in ss output, looks to still be 4k
pages. Are there plans to support huge pages in Umem? How hard would
this be?

2) It looks like there is a limit of 2GB on the maximum Umem size? I've
tried with and without huge pages. Is this fundamental? How hard would
it be to increase this?

For both of these, I'd like to try to help make them happen. If the
kernel side changes are deep or large, it may be beyond me but I can
offer lab equipment and testing.

Thanks.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umem Questions
  2021-05-14 14:34 Umem Questions Dan Siemon
@ 2021-05-17  8:21 ` Magnus Karlsson
  2021-05-19 21:09   ` Dan Siemon
  0 siblings, 1 reply; 7+ messages in thread
From: Magnus Karlsson @ 2021-05-17  8:21 UTC (permalink / raw)
  To: Dan Siemon, Björn Töpel; +Cc: Xdp

On Fri, May 14, 2021 at 9:21 PM Dan Siemon <dan@coverfire.com> wrote:
>
> I've been trying to work with large Umem areas and have a few questions
> . I'd appreciate any help or pointers. If it makes any difference, my
> AF_XDP testing is with i40e.

These issues are driver independent, but I appreciated that you report
this. As you are very well aware of, some things are driver dependent.

> 1) I use kernel args to reserve huge pages on boot. The application
> mmap call with the huge TLB flag appears to use huge pages as I can see
> the count of used huge pages go up (/proc/meminfo). However, the number
> of pages used by the umem, as shown in ss output, looks to still be 4k
> pages. Are there plans to support huge pages in Umem? How hard would
> this be?

Something similar has been on the todo list for two years, but sadly
neither Björn nor I have had any time to pick this up and cannot see
me having the time to pick it up in the foreseeable future either.
There are at least 3 problems that would have to be addressed in this
area:

1: Using a huge page for the umem kernel mapping. As you have
allocated this using a huge page, it will be physically consecutive.
2: Making sure dma addresses are physically consecutive
3: Using a huge page for the IOMMU and its DMA mappings

#1 and #3 are hard problems, at least in my mind. I am no mm or iommu
guy, but I do not believe that there is support for this in the kernel
for use by kernel mappings. The kernel will break down huge-pages into
4K pages for its own mappings. If I am incorrect, I hope that someone
reading this will correct me. But we should do some mailing list
browsing here to see what the latest thoughts are and what has been
tried before.

As for #2, Björn had some discussions with the iommu maintainer about
this in the past [1]. There is no such interface in the iommu
subsystem today, but components such as graphics drivers use a "hack"
to make sure that this happens and if not fail. We do not have to
fail, as we can always fall back to the method we have today. Today we
have an array (dma_addr_t *dma_pages) to store all the addresses to
the 4K DMA address regions. With this new interface in place, we could
replace the array with just a single address pointing to the start of
the area, improving performance. #2 is a prerequisite for #3 too.
Christoph Hellwig submitted an interface proposal about a year ago
[1], but nobody has taken on the challenge to implement it.

[1] https://lkml.org/lkml/2020/7/8/131

> 2) It looks like there is a limit of 2GB on the maximum Umem size? I've
> tried with and without huge pages. Is this fundamental? How hard would
> it be to increase this?

This was news to me. Do you know where in the xdp_umem_reg code it
complains about this? I guess it is xsk_umem__create() that fails, or?
The only limit I see from a basic inspection of the code is that the
number of packet buffers cannot be larger than a u32 (4G). But you are
not close to that limit.

Björn, do you know where this limit stems from?

Thanks: Magnus


> For both of these, I'd like to try to help make them happen. If the
> kernel side changes are deep or large, it may be beyond me but I can
> offer lab equipment and testing.
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umem Questions
  2021-05-17  8:21 ` Magnus Karlsson
@ 2021-05-19 21:09   ` Dan Siemon
  2021-05-20  6:14     ` Magnus Karlsson
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Siemon @ 2021-05-19 21:09 UTC (permalink / raw)
  To: Magnus Karlsson, Björn Töpel; +Cc: Xdp


> > 2) It looks like there is a limit of 2GB on the maximum Umem size?
> > I've
> > tried with and without huge pages. Is this fundamental? How hard
> > would
> > it be to increase this?
> 
> This was news to me. Do you know where in the xdp_umem_reg code it
> complains about this? I guess it is xsk_umem__create() that fails,
> or?
> The only limit I see from a basic inspection of the code is that the
> number of packet buffers cannot be larger than a u32 (4G). But you
> are
> not close to that limit.

Yes, the failure is in xsk_umem__create(). I don't know where
specifically but there are a couple spots in kernel side of that which
return ENOMEM which is the return value.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umem Questions
  2021-05-19 21:09   ` Dan Siemon
@ 2021-05-20  6:14     ` Magnus Karlsson
  2021-05-20 12:40       ` Magnus Karlsson
  0 siblings, 1 reply; 7+ messages in thread
From: Magnus Karlsson @ 2021-05-20  6:14 UTC (permalink / raw)
  To: Dan Siemon; +Cc: Björn Töpel, Xdp

On Wed, May 19, 2021 at 11:09 PM Dan Siemon <dan@coverfire.com> wrote:
>
>
> > > 2) It looks like there is a limit of 2GB on the maximum Umem size?
> > > I've
> > > tried with and without huge pages. Is this fundamental? How hard
> > > would
> > > it be to increase this?
> >
> > This was news to me. Do you know where in the xdp_umem_reg code it
> > complains about this? I guess it is xsk_umem__create() that fails,
> > or?
> > The only limit I see from a basic inspection of the code is that the
> > number of packet buffers cannot be larger than a u32 (4G). But you
> > are
> > not close to that limit.
>
> Yes, the failure is in xsk_umem__create(). I don't know where
> specifically but there are a couple spots in kernel side of that which
> return ENOMEM which is the return value.

Can you issue a "ulimit -a" on your system and share the result? Just
to verify that there is no per process limit that kicks in.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umem Questions
  2021-05-20  6:14     ` Magnus Karlsson
@ 2021-05-20 12:40       ` Magnus Karlsson
  2021-05-20 12:50         ` Magnus Karlsson
  0 siblings, 1 reply; 7+ messages in thread
From: Magnus Karlsson @ 2021-05-20 12:40 UTC (permalink / raw)
  To: Dan Siemon; +Cc: Björn Töpel, Xdp

On Thu, May 20, 2021 at 8:14 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Wed, May 19, 2021 at 11:09 PM Dan Siemon <dan@coverfire.com> wrote:
> >
> >
> > > > 2) It looks like there is a limit of 2GB on the maximum Umem size?
> > > > I've
> > > > tried with and without huge pages. Is this fundamental? How hard
> > > > would
> > > > it be to increase this?
> > >
> > > This was news to me. Do you know where in the xdp_umem_reg code it
> > > complains about this? I guess it is xsk_umem__create() that fails,
> > > or?
> > > The only limit I see from a basic inspection of the code is that the
> > > number of packet buffers cannot be larger than a u32 (4G). But you
> > > are
> > > not close to that limit.
> >
> > Yes, the failure is in xsk_umem__create(). I don't know where
> > specifically but there are a couple spots in kernel side of that which
> > return ENOMEM which is the return value.

I think I have found it.

static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
{
  unsigned int gup_flags = FOLL_WRITE;
long npgs;
        int err;

        umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs),
                            GFP_KERNEL | __GFP_NOWARN);
        if (!umem->pgs)
                return -ENOMEM;

This structure becomes too large to allocate with kcalloc(). It needs
to be turned into a kvcalloc() so that it can use vmalloc instead for
requests that are this large. Will spin a patch.

Thanks: Magnus

> Can you issue a "ulimit -a" on your system and share the result? Just
> to verify that there is no per process limit that kicks in.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Umem Questions
  2021-05-20 12:40       ` Magnus Karlsson
@ 2021-05-20 12:50         ` Magnus Karlsson
  2021-05-21 18:17           ` Dan Siemon
  0 siblings, 1 reply; 7+ messages in thread
From: Magnus Karlsson @ 2021-05-20 12:50 UTC (permalink / raw)
  To: Dan Siemon; +Cc: Björn Töpel, Xdp

On Thu, May 20, 2021 at 2:40 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Thu, May 20, 2021 at 8:14 AM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Wed, May 19, 2021 at 11:09 PM Dan Siemon <dan@coverfire.com> wrote:
> > >
> > >
> > > > > 2) It looks like there is a limit of 2GB on the maximum Umem size?
> > > > > I've
> > > > > tried with and without huge pages. Is this fundamental? How hard
> > > > > would
> > > > > it be to increase this?
> > > >
> > > > This was news to me. Do you know where in the xdp_umem_reg code it
> > > > complains about this? I guess it is xsk_umem__create() that fails,
> > > > or?
> > > > The only limit I see from a basic inspection of the code is that the
> > > > number of packet buffers cannot be larger than a u32 (4G). But you
> > > > are
> > > > not close to that limit.
> > >
> > > Yes, the failure is in xsk_umem__create(). I don't know where
> > > specifically but there are a couple spots in kernel side of that which
> > > return ENOMEM which is the return value.
>
> I think I have found it.
>
> static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
> {
>   unsigned int gup_flags = FOLL_WRITE;
> long npgs;
>         int err;
>
>         umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs),
>                             GFP_KERNEL | __GFP_NOWARN);
>         if (!umem->pgs)
>                 return -ENOMEM;
>
> This structure becomes too large to allocate with kcalloc(). It needs
> to be turned into a kvcalloc() so that it can use vmalloc instead for
> requests that are this large. Will spin a patch.
>
> Thanks: Magnus

Could you please check if this one fixes it for you?

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 56a28a686988..f01ef6bda390 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -27,7 +27,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
        unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);

-       kfree(umem->pgs);
+       kvfree(umem->pgs);
        umem->pgs = NULL;
 }

@@ -99,8 +99,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem,
unsigned long address)
        long npgs;
        int err;

-       umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs),
-                           GFP_KERNEL | __GFP_NOWARN);
+       umem->pgs = kvcalloc(umem->npgs, sizeof(*umem->pgs),
GFP_KERNEL | __GFP_NOWARN);
        if (!umem->pgs)
                return -ENOMEM;

@@ -123,7 +122,7 @@ static int xdp_umem_pin_pages(struct xdp_umem
*umem, unsigned long address)
 out_pin:
        xdp_umem_unpin_pages(umem);
 out_pgs:
-       kfree(umem->pgs);
+       kvfree(umem->pgs);
        umem->pgs = NULL;
        return err;
 }


> > Can you issue a "ulimit -a" on your system and share the result? Just
> > to verify that there is no per process limit that kicks in.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Umem Questions
  2021-05-20 12:50         ` Magnus Karlsson
@ 2021-05-21 18:17           ` Dan Siemon
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Siemon @ 2021-05-21 18:17 UTC (permalink / raw)
  To: Magnus Karlsson; +Cc: Björn Töpel, Xdp

> 
> Could you please check if this one fixes it for you?
> 
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 56a28a686988..f01ef6bda390 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -27,7 +27,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem
> *umem)
>  {
>         unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true);
> 
> -       kfree(umem->pgs);
> +       kvfree(umem->pgs);
>         umem->pgs = NULL;
>  }
> 
> @@ -99,8 +99,7 @@ static int xdp_umem_pin_pages(struct xdp_umem
> *umem,
> unsigned long address)
>         long npgs;
>         int err;
> 
> -       umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs),
> -                           GFP_KERNEL | __GFP_NOWARN);
> +       umem->pgs = kvcalloc(umem->npgs, sizeof(*umem->pgs),
> GFP_KERNEL | __GFP_NOWARN);
>         if (!umem->pgs)
>                 return -ENOMEM;
> 
> @@ -123,7 +122,7 @@ static int xdp_umem_pin_pages(struct xdp_umem
> *umem, unsigned long address)
>  out_pin:
>         xdp_umem_unpin_pages(umem);
>  out_pgs:
> -       kfree(umem->pgs);
> +       kvfree(umem->pgs);
>         umem->pgs = NULL;
>         return err;
>  }

Hi Magnus,

Thanks for looking at this. Yes, this patch allows me to create a much
larger Umem. I tried up to 32 GB.

I have not yet run any tests that extensively use the larger Umem.

Thanks again.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-21 18:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-14 14:34 Umem Questions Dan Siemon
2021-05-17  8:21 ` Magnus Karlsson
2021-05-19 21:09   ` Dan Siemon
2021-05-20  6:14     ` Magnus Karlsson
2021-05-20 12:40       ` Magnus Karlsson
2021-05-20 12:50         ` Magnus Karlsson
2021-05-21 18:17           ` Dan Siemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.