* lib/scatterlist.c : sgl_alloc_order promises more than it delivers @ 2020-09-25 1:46 Douglas Gilbert 2020-09-25 2:34 ` Bart Van Assche 0 siblings, 1 reply; 6+ messages in thread From: Douglas Gilbert @ 2020-09-25 1:46 UTC (permalink / raw) To: SCSI development list, linux-block Cc: Bart Van Assche, Martin K. Petersen, USB list The signature of this exported function is: struct scatterlist *sgl_alloc_order(unsigned long long length, unsigned int order, bool chainable, gfp_t gfp, unsigned int *nent_p) That first argument would be better named num_bytes (rather than length). Its type (unsigned long long) seems to promise large allocations (is that 64 or 128 bits?). Due to the implementation it doesn't matter due to this check in that function's definition: /* Check for integer overflow */ if (length > (nent << (PAGE_SHIFT + order))) return NULL; Well _integers_ don't wrap, but that pedantic point aside, 'nent' is an unsigned int which means the rhs expression cannot represent 2^32 or higher. So if length >= 2^32 the function fails (i.e. returns NULL). On 8 GiB and 16 GiB machines I can easily build 6 or 12 GiB sgl_s (with scsi_debug) but only if no single allocation is >= 4 GiB due to the above check. So is the above check intended to do that or is it a bug? Any progress with the "[PATCH] sgl_alloc_order: memory leak" bug fix posted on 20200920 ? sgl_free() is badly named as it leaks for order > 0 . Doug Gilbert PS1 vmalloc() which I would like to replace with sgl_alloc_order() in the scsi_debug driver, does not have a 4 GB limit. PS2 Here are the users of sgl_free() under the drivers directory: find . -name '*.c' -exec grep "sgl_free(" {} \; -print sgl_free(cmd->req.sg); sgl_free(cmd->req.sg); sgl_free(cmd->req.sg); sgl_free(cmd->req.sg); ./nvme/target/tcp.c sgl_free(req->sg); sgl_free(req->sg); sgl_free(req->metadata_sg); ./nvme/target/core.c sgl_free(fod->data_sg); ./nvme/target/fc.c sgl_free(sgl); ./usb/usbip/stub_rx.c sgl_free(urb->sg); sgl_free(priv->sgl); ./usb/usbip/stub_main.c ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: lib/scatterlist.c : sgl_alloc_order promises more than it delivers 2020-09-25 1:46 lib/scatterlist.c : sgl_alloc_order promises more than it delivers Douglas Gilbert @ 2020-09-25 2:34 ` Bart Van Assche 2020-09-25 4:55 ` Douglas Gilbert 0 siblings, 1 reply; 6+ messages in thread From: Bart Van Assche @ 2020-09-25 2:34 UTC (permalink / raw) To: dgilbert, SCSI development list, linux-block; +Cc: Martin K. Petersen, USB list On 2020-09-24 18:46, Douglas Gilbert wrote: > /* Check for integer overflow */ > if (length > (nent << (PAGE_SHIFT + order))) > return NULL; > > Well _integers_ don't wrap, but that pedantic point aside, 'nent' is an > unsigned int which means the rhs expression cannot represent 2^32 or > higher. So if length >= 2^32 the function fails (i.e. returns NULL). > > On 8 GiB and 16 GiB machines I can easily build 6 or 12 GiB sgl_s (with > scsi_debug) but only if no single allocation is >= 4 GiB due to the > above check. > > So is the above check intended to do that or is it a bug? The above check verifies that nent << (PAGE_SHIFT + order) == (uint64_t)nent << (PAGE_SHIFT + order). So I think it does what the comment says it does. Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: lib/scatterlist.c : sgl_alloc_order promises more than it delivers 2020-09-25 2:34 ` Bart Van Assche @ 2020-09-25 4:55 ` Douglas Gilbert 2020-09-26 4:32 ` Bart Van Assche 0 siblings, 1 reply; 6+ messages in thread From: Douglas Gilbert @ 2020-09-25 4:55 UTC (permalink / raw) To: Bart Van Assche, SCSI development list, linux-block Cc: Martin K. Petersen, USB list On 2020-09-24 10:34 p.m., Bart Van Assche wrote: > On 2020-09-24 18:46, Douglas Gilbert wrote: >> /* Check for integer overflow */ >> if (length > (nent << (PAGE_SHIFT + order))) >> return NULL; >> >> Well _integers_ don't wrap, but that pedantic point aside, 'nent' is an >> unsigned int which means the rhs expression cannot represent 2^32 or >> higher. So if length >= 2^32 the function fails (i.e. returns NULL). >> >> On 8 GiB and 16 GiB machines I can easily build 6 or 12 GiB sgl_s (with >> scsi_debug) but only if no single allocation is >= 4 GiB due to the >> above check. >> >> So is the above check intended to do that or is it a bug? > > The above check verifies that nent << (PAGE_SHIFT + order) == > (uint64_t)nent << (PAGE_SHIFT + order). So I think it does what the > comment says it does. I modified sgl_alloc_order() like this: /* Check for integer overflow */ if (length > (nent << (PAGE_SHIFT + order))) { pr_info("%s: (length > (nent << (PAGE_SHIFT + order))\n", __func__); return NULL; } ... Then I tried starting scsi_debug with dev_size_mb=4096 This is what I saw in the log: scsi_debug:scsi_debug_init: fixing max submit queue depth to host max queue depth, 32 sgl_alloc_order: (length > (nent << (PAGE_SHIFT + order)) message repeated 2 times: [sgl_alloc_order: (length > (nent << (PAGE_SHIFT + order))] scsi_debug:sdeb_store_sgat: sdeb_store_sgat: unable to obtain 4096 MiB, last element size: 256 kiB scsi_debug:sdebug_add_store: sgat: user data oom scsi_debug:sdebug_add_store: sdebug_add_store: failed, errno=12 My code steps down from 1024 KiB elements on failure to 512 KiB and if that fails it tries 256 KiB. Then it gives up. The log output is consistent with my analysis. So your stated equality is an inequality when length >= 4 GiB. There is no promotion of unsigned int nent to uint64_t . You can write your own test harness if you don't believe me. The test machine doesn't need much ram. Without the call to sgl_free() corrected, if it really did try to get that much ram and failed toward the end, then (partially) freed up what it had obtained, then you would see a huge memory leak ... Now your intention seems to be that a 4 GiB sgl should be valid. Correct? Can that check just be dropped? Doug Gilbert ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: lib/scatterlist.c : sgl_alloc_order promises more than it delivers 2020-09-25 4:55 ` Douglas Gilbert @ 2020-09-26 4:32 ` Bart Van Assche 2020-10-11 21:21 ` Douglas Gilbert 0 siblings, 1 reply; 6+ messages in thread From: Bart Van Assche @ 2020-09-26 4:32 UTC (permalink / raw) To: dgilbert, SCSI development list, linux-block; +Cc: Martin K. Petersen, USB list On 2020-09-24 21:55, Douglas Gilbert wrote: > My code steps down from 1024 KiB elements on failure to 512 KiB and if that > fails it tries 256 KiB. Then it gives up. The log output is consistent with > my analysis. So your stated equality is an inequality when length >= 4 GiB. > There is no promotion of unsigned int nent to uint64_t . > > You can write your own test harness if you don't believe me. The test machine > doesn't need much ram. Without the call to sgl_free() corrected, if it really > did try to get that much ram and failed toward the end, then (partially) > freed up what it had obtained, then you would see a huge memory leak ...> > > Now your intention seems to be that a 4 GiB sgl should be valid. Correct? > Can that check just be dropped? Hi Doug, When I wrote that code, I did not expect that anyone would try to allocate 4 GiB or more as a single scatterlist. Are there any use cases for which a 4 GiB scatterlist works better than two or more smaller scatterlists? Do you agree that many hardware DMA engines do not support transferring 4 GiB or more at once? Thanks, Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: lib/scatterlist.c : sgl_alloc_order promises more than it delivers 2020-09-26 4:32 ` Bart Van Assche @ 2020-10-11 21:21 ` Douglas Gilbert 2020-10-11 22:24 ` Bart Van Assche 0 siblings, 1 reply; 6+ messages in thread From: Douglas Gilbert @ 2020-10-11 21:21 UTC (permalink / raw) To: Bart Van Assche, SCSI development list, linux-block Cc: Martin K. Petersen, USB list On 2020-09-26 12:32 a.m., Bart Van Assche wrote: > On 2020-09-24 21:55, Douglas Gilbert wrote: >> My code steps down from 1024 KiB elements on failure to 512 KiB and if that >> fails it tries 256 KiB. Then it gives up. The log output is consistent with >> my analysis. So your stated equality is an inequality when length >= 4 GiB. >> There is no promotion of unsigned int nent to uint64_t . >> >> You can write your own test harness if you don't believe me. The test machine >> doesn't need much ram. Without the call to sgl_free() corrected, if it really >> did try to get that much ram and failed toward the end, then (partially) >> freed up what it had obtained, then you would see a huge memory leak ...> >> >> Now your intention seems to be that a 4 GiB sgl should be valid. Correct? >> Can that check just be dropped? > > Hi Doug, > > When I wrote that code, I did not expect that anyone would try to allocate > 4 GiB or more as a single scatterlist. Are there any use cases for which a > 4 GiB scatterlist works better than two or more smaller scatterlists? Then one would wonder why it has this declaration: struct scatterlist *sgl_alloc_order(unsigned long long length, unsigned int order, bool chainable, gfp_t gfp, unsigned int *nent_p) 'unsigned long long length' [in bytes] is a lot; 64 or 128 bits worth; definitely more than 32 bits. And vmalloc is declared: void *vmalloc(unsigned long size); Which is 64 bits on a 64 bit machine (i.e. must be able hold a pointer). And it is vmalloc() that I want to replace with sgl_alloc_order() in the scsi_debug driver. Robert Love writes of vmalloc(): "The vmalloc() function, to make nonphysically contiguous pages contiguous in the virtual address space, must specifically set up the page table entries. Worse, pages obtained via vmalloc() must be mapped by their individual pages (because they are not physically contiguous), which results in much greater TLB4 thrashing than you see when directly mapped memory is used. Because of these concerns, vmalloc() is used only when absolutely necessary—typically, to obtain large regions of memory." ['LK Development' 3rd edition, page 244] And scatterlist seems to be doing in the foreground what vmalloc() is doing in the background, but without those drawbacks. My testing suggests using a store built with sgl_alloc_order() *** is a little faster but with a lower standard deviation (i.e. spread) on timings from repeated tests. Another advantage of a scatterlist-based store in the scsi_debug driver is that the data-in and data-out buffers associated with SCSI commands also come through as scatterlist-based objects. Thus I can do almost all the manipulations the driver needs to do to simulate a disk by adding these general functions: - sgl_copy_sgl() - sgl_cmp_sgl() - sgl_memset() - sgl_prefetch() A memmove() variant would be simple to implement, but the scsi_debug driver doesn't need it. > Do you agree that many hardware DMA engines do not support transferring > 4 GiB or more at once? I agree that one element of a scatter gather list should not exceed 4 GiB of memory. In scsi_debug the scatter gather list (one per store) has in some cases several thousand elements. But I do not agree that the _sum_ of the size of those elements should be limited to 4 GiB. With those two lines removed from sgl_alloc_order() I can test an 8 GiB scsi_debug ram disk on a 16 GiB machine. [I made it into 1 partition, did mkfs.ext4, mounted it, rsync-ed the kernel source onto it and built a kernel that runs. A reasonable test, no?] Doug Gilbert *** the very useful property of sgl_alloc_order() is that each element of the scatter gather list has the same order (or it fails). This allows O(1) navigation of a big store like a 8 GiB ramdisk since sg_miter_skip() can be avoided with some simple integer maths. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: lib/scatterlist.c : sgl_alloc_order promises more than it delivers 2020-10-11 21:21 ` Douglas Gilbert @ 2020-10-11 22:24 ` Bart Van Assche 0 siblings, 0 replies; 6+ messages in thread From: Bart Van Assche @ 2020-10-11 22:24 UTC (permalink / raw) To: dgilbert, SCSI development list, linux-block; +Cc: Martin K. Petersen, USB list On 10/11/20 2:21 PM, Douglas Gilbert wrote: > My testing suggests using a store built with sgl_alloc_order() *** is a > little faster but with a lower standard deviation (i.e. spread) on timings > from repeated tests. sgl_alloc_order() supports allocating SG-lists with higher order pages. Allocating such S/G-lists is a workaround for the segment count limitations of some DMA engines. Are you perhaps using sgl_alloc_order() for allocating long-living data buffers? sgl_alloc_order() was not intended to be used for that purpose. Anyway, if your use case can be implemented without introducing any drawbacks for other users, feel free to submit a patch. Thanks, Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-10-11 22:24 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-25 1:46 lib/scatterlist.c : sgl_alloc_order promises more than it delivers Douglas Gilbert 2020-09-25 2:34 ` Bart Van Assche 2020-09-25 4:55 ` Douglas Gilbert 2020-09-26 4:32 ` Bart Van Assche 2020-10-11 21:21 ` Douglas Gilbert 2020-10-11 22:24 ` Bart Van Assche
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).