* Umem Questions @ 2021-05-14 14:34 Dan Siemon 2021-05-17 8:21 ` Magnus Karlsson 0 siblings, 1 reply; 7+ messages in thread From: Dan Siemon @ 2021-05-14 14:34 UTC (permalink / raw) To: Xdp I've been trying to work with large Umem areas and have a few questions . I'd appreciate any help or pointers. If it makes any difference, my AF_XDP testing is with i40e. 1) I use kernel args to reserve huge pages on boot. The application mmap call with the huge TLB flag appears to use huge pages as I can see the count of used huge pages go up (/proc/meminfo). However, the number of pages used by the umem, as shown in ss output, looks to still be 4k pages. Are there plans to support huge pages in Umem? How hard would this be? 2) It looks like there is a limit of 2GB on the maximum Umem size? I've tried with and without huge pages. Is this fundamental? How hard would it be to increase this? For both of these, I'd like to try to help make them happen. If the kernel side changes are deep or large, it may be beyond me but I can offer lab equipment and testing. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umem Questions 2021-05-14 14:34 Umem Questions Dan Siemon @ 2021-05-17 8:21 ` Magnus Karlsson 2021-05-19 21:09 ` Dan Siemon 0 siblings, 1 reply; 7+ messages in thread From: Magnus Karlsson @ 2021-05-17 8:21 UTC (permalink / raw) To: Dan Siemon, Björn Töpel; +Cc: Xdp On Fri, May 14, 2021 at 9:21 PM Dan Siemon <dan@coverfire.com> wrote: > > I've been trying to work with large Umem areas and have a few questions > . I'd appreciate any help or pointers. If it makes any difference, my > AF_XDP testing is with i40e. These issues are driver independent, but I appreciated that you report this. As you are very well aware of, some things are driver dependent. > 1) I use kernel args to reserve huge pages on boot. The application > mmap call with the huge TLB flag appears to use huge pages as I can see > the count of used huge pages go up (/proc/meminfo). However, the number > of pages used by the umem, as shown in ss output, looks to still be 4k > pages. Are there plans to support huge pages in Umem? How hard would > this be? Something similar has been on the todo list for two years, but sadly neither Björn nor I have had any time to pick this up and cannot see me having the time to pick it up in the foreseeable future either. There are at least 3 problems that would have to be addressed in this area: 1: Using a huge page for the umem kernel mapping. As you have allocated this using a huge page, it will be physically consecutive. 2: Making sure dma addresses are physically consecutive 3: Using a huge page for the IOMMU and its DMA mappings #1 and #3 are hard problems, at least in my mind. I am no mm or iommu guy, but I do not believe that there is support for this in the kernel for use by kernel mappings. The kernel will break down huge-pages into 4K pages for its own mappings. If I am incorrect, I hope that someone reading this will correct me. But we should do some mailing list browsing here to see what the latest thoughts are and what has been tried before. As for #2, Björn had some discussions with the iommu maintainer about this in the past [1]. There is no such interface in the iommu subsystem today, but components such as graphics drivers use a "hack" to make sure that this happens and if not fail. We do not have to fail, as we can always fall back to the method we have today. Today we have an array (dma_addr_t *dma_pages) to store all the addresses to the 4K DMA address regions. With this new interface in place, we could replace the array with just a single address pointing to the start of the area, improving performance. #2 is a prerequisite for #3 too. Christoph Hellwig submitted an interface proposal about a year ago [1], but nobody has taken on the challenge to implement it. [1] https://lkml.org/lkml/2020/7/8/131 > 2) It looks like there is a limit of 2GB on the maximum Umem size? I've > tried with and without huge pages. Is this fundamental? How hard would > it be to increase this? This was news to me. Do you know where in the xdp_umem_reg code it complains about this? I guess it is xsk_umem__create() that fails, or? The only limit I see from a basic inspection of the code is that the number of packet buffers cannot be larger than a u32 (4G). But you are not close to that limit. Björn, do you know where this limit stems from? Thanks: Magnus > For both of these, I'd like to try to help make them happen. If the > kernel side changes are deep or large, it may be beyond me but I can > offer lab equipment and testing. > > Thanks. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umem Questions 2021-05-17 8:21 ` Magnus Karlsson @ 2021-05-19 21:09 ` Dan Siemon 2021-05-20 6:14 ` Magnus Karlsson 0 siblings, 1 reply; 7+ messages in thread From: Dan Siemon @ 2021-05-19 21:09 UTC (permalink / raw) To: Magnus Karlsson, Björn Töpel; +Cc: Xdp > > 2) It looks like there is a limit of 2GB on the maximum Umem size? > > I've > > tried with and without huge pages. Is this fundamental? How hard > > would > > it be to increase this? > > This was news to me. Do you know where in the xdp_umem_reg code it > complains about this? I guess it is xsk_umem__create() that fails, > or? > The only limit I see from a basic inspection of the code is that the > number of packet buffers cannot be larger than a u32 (4G). But you > are > not close to that limit. Yes, the failure is in xsk_umem__create(). I don't know where specifically but there are a couple spots in kernel side of that which return ENOMEM which is the return value. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umem Questions 2021-05-19 21:09 ` Dan Siemon @ 2021-05-20 6:14 ` Magnus Karlsson 2021-05-20 12:40 ` Magnus Karlsson 0 siblings, 1 reply; 7+ messages in thread From: Magnus Karlsson @ 2021-05-20 6:14 UTC (permalink / raw) To: Dan Siemon; +Cc: Björn Töpel, Xdp On Wed, May 19, 2021 at 11:09 PM Dan Siemon <dan@coverfire.com> wrote: > > > > > 2) It looks like there is a limit of 2GB on the maximum Umem size? > > > I've > > > tried with and without huge pages. Is this fundamental? How hard > > > would > > > it be to increase this? > > > > This was news to me. Do you know where in the xdp_umem_reg code it > > complains about this? I guess it is xsk_umem__create() that fails, > > or? > > The only limit I see from a basic inspection of the code is that the > > number of packet buffers cannot be larger than a u32 (4G). But you > > are > > not close to that limit. > > Yes, the failure is in xsk_umem__create(). I don't know where > specifically but there are a couple spots in kernel side of that which > return ENOMEM which is the return value. Can you issue a "ulimit -a" on your system and share the result? Just to verify that there is no per process limit that kicks in. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umem Questions 2021-05-20 6:14 ` Magnus Karlsson @ 2021-05-20 12:40 ` Magnus Karlsson 2021-05-20 12:50 ` Magnus Karlsson 0 siblings, 1 reply; 7+ messages in thread From: Magnus Karlsson @ 2021-05-20 12:40 UTC (permalink / raw) To: Dan Siemon; +Cc: Björn Töpel, Xdp On Thu, May 20, 2021 at 8:14 AM Magnus Karlsson <magnus.karlsson@gmail.com> wrote: > > On Wed, May 19, 2021 at 11:09 PM Dan Siemon <dan@coverfire.com> wrote: > > > > > > > > 2) It looks like there is a limit of 2GB on the maximum Umem size? > > > > I've > > > > tried with and without huge pages. Is this fundamental? How hard > > > > would > > > > it be to increase this? > > > > > > This was news to me. Do you know where in the xdp_umem_reg code it > > > complains about this? I guess it is xsk_umem__create() that fails, > > > or? > > > The only limit I see from a basic inspection of the code is that the > > > number of packet buffers cannot be larger than a u32 (4G). But you > > > are > > > not close to that limit. > > > > Yes, the failure is in xsk_umem__create(). I don't know where > > specifically but there are a couple spots in kernel side of that which > > return ENOMEM which is the return value. I think I have found it. static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) { unsigned int gup_flags = FOLL_WRITE; long npgs; int err; umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL | __GFP_NOWARN); if (!umem->pgs) return -ENOMEM; This structure becomes too large to allocate with kcalloc(). It needs to be turned into a kvcalloc() so that it can use vmalloc instead for requests that are this large. Will spin a patch. Thanks: Magnus > Can you issue a "ulimit -a" on your system and share the result? Just > to verify that there is no per process limit that kicks in. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umem Questions 2021-05-20 12:40 ` Magnus Karlsson @ 2021-05-20 12:50 ` Magnus Karlsson 2021-05-21 18:17 ` Dan Siemon 0 siblings, 1 reply; 7+ messages in thread From: Magnus Karlsson @ 2021-05-20 12:50 UTC (permalink / raw) To: Dan Siemon; +Cc: Björn Töpel, Xdp On Thu, May 20, 2021 at 2:40 PM Magnus Karlsson <magnus.karlsson@gmail.com> wrote: > > On Thu, May 20, 2021 at 8:14 AM Magnus Karlsson > <magnus.karlsson@gmail.com> wrote: > > > > On Wed, May 19, 2021 at 11:09 PM Dan Siemon <dan@coverfire.com> wrote: > > > > > > > > > > > 2) It looks like there is a limit of 2GB on the maximum Umem size? > > > > > I've > > > > > tried with and without huge pages. Is this fundamental? How hard > > > > > would > > > > > it be to increase this? > > > > > > > > This was news to me. Do you know where in the xdp_umem_reg code it > > > > complains about this? I guess it is xsk_umem__create() that fails, > > > > or? > > > > The only limit I see from a basic inspection of the code is that the > > > > number of packet buffers cannot be larger than a u32 (4G). But you > > > > are > > > > not close to that limit. > > > > > > Yes, the failure is in xsk_umem__create(). I don't know where > > > specifically but there are a couple spots in kernel side of that which > > > return ENOMEM which is the return value. > > I think I have found it. > > static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) > { > unsigned int gup_flags = FOLL_WRITE; > long npgs; > int err; > > umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), > GFP_KERNEL | __GFP_NOWARN); > if (!umem->pgs) > return -ENOMEM; > > This structure becomes too large to allocate with kcalloc(). It needs > to be turned into a kvcalloc() so that it can use vmalloc instead for > requests that are this large. Will spin a patch. > > Thanks: Magnus Could you please check if this one fixes it for you? diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 56a28a686988..f01ef6bda390 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -27,7 +27,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem) { unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); - kfree(umem->pgs); + kvfree(umem->pgs); umem->pgs = NULL; } @@ -99,8 +99,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) long npgs; int err; - umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), - GFP_KERNEL | __GFP_NOWARN); + umem->pgs = kvcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL | __GFP_NOWARN); if (!umem->pgs) return -ENOMEM; @@ -123,7 +122,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) out_pin: xdp_umem_unpin_pages(umem); out_pgs: - kfree(umem->pgs); + kvfree(umem->pgs); umem->pgs = NULL; return err; } > > Can you issue a "ulimit -a" on your system and share the result? Just > > to verify that there is no per process limit that kicks in. ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Umem Questions 2021-05-20 12:50 ` Magnus Karlsson @ 2021-05-21 18:17 ` Dan Siemon 0 siblings, 0 replies; 7+ messages in thread From: Dan Siemon @ 2021-05-21 18:17 UTC (permalink / raw) To: Magnus Karlsson; +Cc: Björn Töpel, Xdp > > Could you please check if this one fixes it for you? > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c > index 56a28a686988..f01ef6bda390 100644 > --- a/net/xdp/xdp_umem.c > +++ b/net/xdp/xdp_umem.c > @@ -27,7 +27,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem > *umem) > { > unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); > > - kfree(umem->pgs); > + kvfree(umem->pgs); > umem->pgs = NULL; > } > > @@ -99,8 +99,7 @@ static int xdp_umem_pin_pages(struct xdp_umem > *umem, > unsigned long address) > long npgs; > int err; > > - umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), > - GFP_KERNEL | __GFP_NOWARN); > + umem->pgs = kvcalloc(umem->npgs, sizeof(*umem->pgs), > GFP_KERNEL | __GFP_NOWARN); > if (!umem->pgs) > return -ENOMEM; > > @@ -123,7 +122,7 @@ static int xdp_umem_pin_pages(struct xdp_umem > *umem, unsigned long address) > out_pin: > xdp_umem_unpin_pages(umem); > out_pgs: > - kfree(umem->pgs); > + kvfree(umem->pgs); > umem->pgs = NULL; > return err; > } Hi Magnus, Thanks for looking at this. Yes, this patch allows me to create a much larger Umem. I tried up to 32 GB. I have not yet run any tests that extensively use the larger Umem. Thanks again. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-05-21 18:17 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-14 14:34 Umem Questions Dan Siemon 2021-05-17 8:21 ` Magnus Karlsson 2021-05-19 21:09 ` Dan Siemon 2021-05-20 6:14 ` Magnus Karlsson 2021-05-20 12:40 ` Magnus Karlsson 2021-05-20 12:50 ` Magnus Karlsson 2021-05-21 18:17 ` Dan Siemon
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.