* [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations @ 2012-09-05 15:13 Paolo Bonzini 2012-09-05 15:13 ` [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg Paolo Bonzini ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Paolo Bonzini @ 2012-09-05 15:13 UTC (permalink / raw) To: linux-kernel; +Cc: target-devel Hi all, while testing PSCSI I noticed that even requests for a smallish amount of data (approximately 700 KB) failed due to an excessive number of segments in the request. In fact, using alloc_page resulted in a completely fragmented request, with no merging of consecutive pages at all. This patch series fixes this problem by using higher-order allocations to build the data scatterlist. The problem is that iscsi assumes that the scatterlist consists of single pages, which is not true anymore. So patch 2 has to introduce some relatively complicated changes to iscsi_map_iovec and iscsi_unmap_iovec. While doing this, I noticed something strange in iscsit_do_crypto_hash_sg. Patch 1 adds a warning about it. The approach may be completely wrong and it needs more testing anyway. Please review! Paolo Paolo Bonzini (3): tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg tcm_iscsi: support multiple sizes in the scatterlist target: try satisfying memory requests with contiguous blocks drivers/target/iscsi/iscsi_target.c | 106 +++++++++++++++++++++++++----- drivers/target/iscsi/iscsi_target_core.h | 2 +- drivers/target/target_core_transport.c | 58 ++++++++++++++--- 3 files changed, 138 insertions(+), 28 deletions(-) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg 2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini @ 2012-09-05 15:13 ` Paolo Bonzini 2012-09-05 15:13 ` [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist Paolo Bonzini ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Paolo Bonzini @ 2012-09-05 15:13 UTC (permalink / raw) To: linux-kernel; +Cc: target-devel While iscsit_do_crypto_hash_sg does look at cmd->first_data_sg_off when computing cur_len, it ignores it completely when computing the hash. As a result, the first cmd->first_data_sg_off bytes are included in the hash even though they should not. Warn if this happens. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- drivers/target/iscsi/iscsi_target.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 97c0f78..224679e 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -1135,6 +1135,7 @@ static u32 iscsit_do_crypto_hash_sg( sg = cmd->first_data_sg; page_off = cmd->first_data_sg_off; + WARN_ON(page_off != sg->offset); i = 0; while (data_length) { -- 1.7.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist 2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini 2012-09-05 15:13 ` [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg Paolo Bonzini @ 2012-09-05 15:13 ` Paolo Bonzini 2012-09-06 2:33 ` Nicholas A. Bellinger 2012-09-05 15:13 ` [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks Paolo Bonzini 2012-09-06 1:58 ` [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Nicholas A. Bellinger 3 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2012-09-05 15:13 UTC (permalink / raw) To: linux-kernel; +Cc: target-devel The next patch will use multiple orders to satisfy the allocation of the data buffers. We need to support this in iscsi_map_iovec and iscsi_unmap_iovec. The idea here is to walk each relevant page in the scatterlist (which may not be every page in the scatterlist due to data_offset) and kmap it. iscsi_unmap_iovec uses the same algorithm, so that each kunmap maps the kmap that was used in iscsi_map_iovec. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- drivers/target/iscsi/iscsi_target.c | 106 ++++++++++++++++++++++++----- drivers/target/iscsi/iscsi_target_core.h | 2 +- 2 files changed, 89 insertions(+), 19 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 224679e..dc4f5da 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -684,45 +684,115 @@ static int iscsit_map_iovec( u32 data_offset, u32 data_length) { - u32 i = 0; struct scatterlist *sg; - unsigned int page_off; + u32 sg_off, sg_left, cur_len; + u32 sg_pfn = 0; + u32 i = 0; - /* - * We know each entry in t_data_sg contains a page. - */ - sg = &cmd->se_cmd.t_data_sg[data_offset / PAGE_SIZE]; - page_off = (data_offset % PAGE_SIZE); + sg = cmd->se_cmd.t_data_sg; + while (data_offset >= sg->length) { + data_offset -= sg->length; + sg++; + } cmd->first_data_sg = sg; - cmd->first_data_sg_off = page_off; + cmd->first_data_sg_off = sg->offset + data_offset; + cmd->kmapped_length = data_length; + sg_left = 0; while (data_length) { - u32 cur_len = min_t(u32, data_length, sg->length - page_off); + if (!sg_left) { + sg_off = sg->offset + data_offset; + sg_pfn = page_to_pfn(sg_page(sg)); + sg_pfn += sg_off >> PAGE_SHIFT; + sg_off &= PAGE_SIZE - 1; + + sg_left = min(sg->length, data_length); + + /* + * The next scatterlist is mapped from the beginning. + */ + sg++; + data_offset = 0; + } - iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; + /* + * Create an iovec for (a part of) this page. + */ + cur_len = min_t(u32, PAGE_SIZE - sg_off, sg_left); + BUG_ON(!cur_len); + + iov[i].iov_base = kmap(pfn_to_page(sg_pfn)) + sg_off; iov[i].iov_len = cur_len; + i++; data_length -= cur_len; - page_off = 0; - sg = sg_next(sg); - i++; - } + sg_left -= cur_len; - cmd->kmapped_nents = i; + /* + * The next pfn is mapped from the beginning. + */ + sg_pfn++; + sg_off = 0; + } + BUG_ON(sg_left); return i; } static void iscsit_unmap_iovec(struct iscsi_cmd *cmd) { - u32 i; struct scatterlist *sg; + u32 data_offset, data_length; + u32 sg_off, sg_left, cur_len; + u32 sg_pfn = 0; + + if (!cmd->kmapped_length) + return; sg = cmd->first_data_sg; + data_offset = cmd->first_data_sg_off - sg->offset; + data_length = cmd->kmapped_length; + + /* + * This loop must mirror exactly what is done in iscsi_map_iovec. + */ + sg_left = 0; + while (data_length) { + if (!sg_left) { + sg_off = sg->offset + data_offset; + sg_pfn = page_to_pfn(sg_page(sg)); + sg_pfn += sg_off >> PAGE_SHIFT; + sg_off &= PAGE_SIZE - 1; + + sg_left = min(sg->length, data_length); + + /* + * The next scatterlist is mapped from the beginning. + */ + sg++; + data_offset = 0; + } + + /* + * Create an iovec for (a part of) this page. + */ + cur_len = min_t(u32, PAGE_SIZE - sg_off, sg_left); + BUG_ON(!cur_len); + + kunmap(pfn_to_page(sg_pfn)); + data_length -= cur_len; + sg_left -= cur_len; + + /* + * The next pfn is mapped from the beginning. + */ + sg_pfn++; + sg_off = 0; + } - for (i = 0; i < cmd->kmapped_nents; i++) - kunmap(sg_page(&sg[i])); + BUG_ON(sg_left); + cmd->kmapped_length = 0; } static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn) diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h index 8a908b2..13bdaf1 100644 --- a/drivers/target/iscsi/iscsi_target_core.h +++ b/drivers/target/iscsi/iscsi_target_core.h @@ -472,7 +472,7 @@ struct iscsi_cmd { struct scatterlist *first_data_sg; u32 first_data_sg_off; - u32 kmapped_nents; + u32 kmapped_length; } ____cacheline_aligned; -- 1.7.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist 2012-09-05 15:13 ` [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist Paolo Bonzini @ 2012-09-06 2:33 ` Nicholas A. Bellinger 0 siblings, 0 replies; 10+ messages in thread From: Nicholas A. Bellinger @ 2012-09-06 2:33 UTC (permalink / raw) To: Paolo Bonzini Cc: linux-kernel, target-devel, Andy Grover, Christoph Hellwig, Roland Dreier On Wed, 2012-09-05 at 17:13 +0200, Paolo Bonzini wrote: > The next patch will use multiple orders to satisfy the allocation > of the data buffers. > > We need to support this in iscsi_map_iovec and iscsi_unmap_iovec. > The idea here is to walk each relevant page in the scatterlist > (which may not be every page in the scatterlist due to data_offset) > and kmap it. iscsi_unmap_iovec uses the same algorithm, so that > each kunmap maps the kmap that was used in iscsi_map_iovec. > CC'ing Agrover here as he last touched the iscsit_[map,unmap]_iovec() before the mainline merge of this code.. Also, this type of change is going to require other Reviewed-By's and serious fio+writeverify Tested-By's to make sure we get all of the corner cases as this enabled for upstream iscsi-target. I'll be having a deeper look sometime next week, but as long as we can selectively disable this for fabrics I don't have a problem considering it for-3.7 code. Nice work Paolo! > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > drivers/target/iscsi/iscsi_target.c | 106 ++++++++++++++++++++++++----- > drivers/target/iscsi/iscsi_target_core.h | 2 +- > 2 files changed, 89 insertions(+), 19 deletions(-) > > diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c > index 224679e..dc4f5da 100644 > --- a/drivers/target/iscsi/iscsi_target.c > +++ b/drivers/target/iscsi/iscsi_target.c > @@ -684,45 +684,115 @@ static int iscsit_map_iovec( > u32 data_offset, > u32 data_length) > { > - u32 i = 0; > struct scatterlist *sg; > - unsigned int page_off; > + u32 sg_off, sg_left, cur_len; > + u32 sg_pfn = 0; > + u32 i = 0; > > - /* > - * We know each entry in t_data_sg contains a page. > - */ > - sg = &cmd->se_cmd.t_data_sg[data_offset / PAGE_SIZE]; > - page_off = (data_offset % PAGE_SIZE); > + sg = cmd->se_cmd.t_data_sg; > + while (data_offset >= sg->length) { > + data_offset -= sg->length; > + sg++; > + } > > cmd->first_data_sg = sg; > - cmd->first_data_sg_off = page_off; > + cmd->first_data_sg_off = sg->offset + data_offset; > + cmd->kmapped_length = data_length; > > + sg_left = 0; > while (data_length) { > - u32 cur_len = min_t(u32, data_length, sg->length - page_off); > + if (!sg_left) { > + sg_off = sg->offset + data_offset; > + sg_pfn = page_to_pfn(sg_page(sg)); > + sg_pfn += sg_off >> PAGE_SHIFT; > + sg_off &= PAGE_SIZE - 1; > + > + sg_left = min(sg->length, data_length); > + > + /* > + * The next scatterlist is mapped from the beginning. > + */ > + sg++; > + data_offset = 0; > + } > > - iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; > + /* > + * Create an iovec for (a part of) this page. > + */ > + cur_len = min_t(u32, PAGE_SIZE - sg_off, sg_left); > + BUG_ON(!cur_len); > + > + iov[i].iov_base = kmap(pfn_to_page(sg_pfn)) + sg_off; > iov[i].iov_len = cur_len; > + i++; > > data_length -= cur_len; > - page_off = 0; > - sg = sg_next(sg); > - i++; > - } > + sg_left -= cur_len; > > - cmd->kmapped_nents = i; > + /* > + * The next pfn is mapped from the beginning. > + */ > + sg_pfn++; > + sg_off = 0; > + } > > + BUG_ON(sg_left); > return i; > } > > static void iscsit_unmap_iovec(struct iscsi_cmd *cmd) > { > - u32 i; > struct scatterlist *sg; > + u32 data_offset, data_length; > + u32 sg_off, sg_left, cur_len; > + u32 sg_pfn = 0; > + > + if (!cmd->kmapped_length) > + return; > > sg = cmd->first_data_sg; > + data_offset = cmd->first_data_sg_off - sg->offset; > + data_length = cmd->kmapped_length; > + > + /* > + * This loop must mirror exactly what is done in iscsi_map_iovec. > + */ > + sg_left = 0; > + while (data_length) { > + if (!sg_left) { > + sg_off = sg->offset + data_offset; > + sg_pfn = page_to_pfn(sg_page(sg)); > + sg_pfn += sg_off >> PAGE_SHIFT; > + sg_off &= PAGE_SIZE - 1; > + > + sg_left = min(sg->length, data_length); > + > + /* > + * The next scatterlist is mapped from the beginning. > + */ > + sg++; > + data_offset = 0; > + } > + > + /* > + * Create an iovec for (a part of) this page. > + */ > + cur_len = min_t(u32, PAGE_SIZE - sg_off, sg_left); > + BUG_ON(!cur_len); > + > + kunmap(pfn_to_page(sg_pfn)); > + data_length -= cur_len; > + sg_left -= cur_len; > + > + /* > + * The next pfn is mapped from the beginning. > + */ > + sg_pfn++; > + sg_off = 0; > + } > > - for (i = 0; i < cmd->kmapped_nents; i++) > - kunmap(sg_page(&sg[i])); > + BUG_ON(sg_left); > + cmd->kmapped_length = 0; > } > > static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn) > diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h > index 8a908b2..13bdaf1 100644 > --- a/drivers/target/iscsi/iscsi_target_core.h > +++ b/drivers/target/iscsi/iscsi_target_core.h > @@ -472,7 +472,7 @@ struct iscsi_cmd { > > struct scatterlist *first_data_sg; > u32 first_data_sg_off; > - u32 kmapped_nents; > + u32 kmapped_length; > > } ____cacheline_aligned; > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks 2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini 2012-09-05 15:13 ` [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg Paolo Bonzini 2012-09-05 15:13 ` [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist Paolo Bonzini @ 2012-09-05 15:13 ` Paolo Bonzini 2012-09-06 2:19 ` Nicholas A. Bellinger 2012-09-06 1:58 ` [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Nicholas A. Bellinger 3 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2012-09-05 15:13 UTC (permalink / raw) To: linux-kernel; +Cc: target-devel transport_generic_get_mem's strategy of allocating pages one-by-one for the data scatterlist may fail for even not-so-big data transfers if the underlying device poses a small limit on the number of segments. This patch fixes this problem by trying to allocate pages in relatively large groups (at most 512 pages right now, equivalent to 2 MB per allocation), but still without any slack. The remainder is then satisfied with subsequent smaller allocations. For example, if the data to be transferred is 132 KB, the new algorithm will attempt a single 128 KB allocation, followed by a 4 KB allocation. In case a contiguous block cannot be found, it will fall back to a progressively smaller order. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- drivers/target/target_core_transport.c | 58 +++++++++++++++++++++++++++----- 1 files changed, 49 insertions(+), 9 deletions(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 09028af..6e90e2c 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2058,7 +2058,7 @@ static inline void transport_free_sgl(struct scatterlist *sgl, int nents) int count; for_each_sg(sgl, sg, nents, count) - __free_page(sg_page(sg)); + __free_pages(sg_page(sg), get_order(sg->length)); kfree(sgl); } @@ -2229,41 +2229,81 @@ void transport_kunmap_data_sg(struct se_cmd *cmd) } EXPORT_SYMBOL(transport_kunmap_data_sg); +#define TRANSPORT_MAX_ORDER 9 + +static inline int +get_order_no_slack(u32 length) +{ + /* + * get_order gives the smallest order that length fits in. + * We want instead the largest order that does not give + * any slack. + * + * Example (with PAGE_SIZE = 4096, PAGE_SHIFT = 12): + * get_order_no_slack(0x1000) = fls(0) = 0, get_order(0x1000) = 0 + * get_order_no_slack(0x1001) = fls(0) = 0, get_order(0x1001) = 1 + * get_order_no_slack(0x1fff) = fls(0) = 0, get_order(0x1fff) = 1 + * get_order_no_slack(0x2000) = fls(1) = 1, get_order(0x2000) = 1 + * get_order_no_slack(0x3000) = fls(1) = 1, get_order(0x3000) = 2 + * get_order_no_slack(0x4000) = fls(2) = 2, get_order(0x4000) = 2 + * get_order_no_slack(0x8000) = fls(4) = 3, get_order(0x8000) = 3 + */ + length >>= PAGE_SHIFT + 1; + return fls(length); +} + static int transport_generic_get_mem(struct se_cmd *cmd) { u32 length = cmd->data_length; + u32 page_len; unsigned int nents; struct page *page; gfp_t zero_flag; int i = 0; + int order, max_order; nents = DIV_ROUND_UP(length, PAGE_SIZE); cmd->t_data_sg = kmalloc(sizeof(struct scatterlist) * nents, GFP_KERNEL); if (!cmd->t_data_sg) return -ENOMEM; - cmd->t_data_nents = nents; sg_init_table(cmd->t_data_sg, nents); zero_flag = cmd->se_cmd_flags & SCF_SCSI_DATA_CDB ? 0 : __GFP_ZERO; + max_order = TRANSPORT_MAX_ORDER; while (length) { - u32 page_len = min_t(u32, length, PAGE_SIZE); - page = alloc_page(GFP_KERNEL | zero_flag); - if (!page) - goto out; + order = get_order_no_slack(length); + for (;;) { + order = min(order, max_order); + page_len = min_t(u32, length, PAGE_SIZE << order); + page = alloc_pages(GFP_KERNEL | zero_flag, order); + if (page) + break; + + if (!order) + goto out; + + /* + * Allocation failed, only try with a smaller order + * from this point. + */ + max_order = order - 1; + } sg_set_page(&cmd->t_data_sg[i], page, page_len, 0); length -= page_len; i++; } + cmd->t_data_nents = i; + sg_mark_end(&cmd->t_data_sg[i - 1]); return 0; out: - while (i > 0) { - i--; - __free_page(sg_page(&cmd->t_data_sg[i])); + while (i-- > 0) { + struct scatterlist *sg = &cmd->t_data_sg[i]; + __free_pages(sg_page(sg), get_order(sg->length)); } kfree(cmd->t_data_sg); cmd->t_data_sg = NULL; -- 1.7.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks 2012-09-05 15:13 ` [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks Paolo Bonzini @ 2012-09-06 2:19 ` Nicholas A. Bellinger 0 siblings, 0 replies; 10+ messages in thread From: Nicholas A. Bellinger @ 2012-09-06 2:19 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, target-devel, Christoph Hellwig On Wed, 2012-09-05 at 17:13 +0200, Paolo Bonzini wrote: > transport_generic_get_mem's strategy of allocating pages one-by-one > for the data scatterlist may fail for even not-so-big data transfers > if the underlying device poses a small limit on the number of segments. > > This patch fixes this problem by trying to allocate pages in relatively > large groups (at most 512 pages right now, equivalent to 2 MB per > allocation), but still without any slack. The remainder is then > satisfied with subsequent smaller allocations. For example, if the > data to be transferred is 132 KB, the new algorithm will attempt > a single 128 KB allocation, followed by a 4 KB allocation. > In case a contiguous block cannot be found, it will fall back to > a progressively smaller order. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > drivers/target/target_core_transport.c | 58 +++++++++++++++++++++++++++----- > 1 files changed, 49 insertions(+), 9 deletions(-) > > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index 09028af..6e90e2c 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -2058,7 +2058,7 @@ static inline void transport_free_sgl(struct scatterlist *sgl, int nents) > int count; > > for_each_sg(sgl, sg, nents, count) > - __free_page(sg_page(sg)); > + __free_pages(sg_page(sg), get_order(sg->length)); > > kfree(sgl); > } > @@ -2229,41 +2229,81 @@ void transport_kunmap_data_sg(struct se_cmd *cmd) > } > EXPORT_SYMBOL(transport_kunmap_data_sg); > > +#define TRANSPORT_MAX_ORDER 9 > + > +static inline int > +get_order_no_slack(u32 length) > +{ > + /* > + * get_order gives the smallest order that length fits in. > + * We want instead the largest order that does not give > + * any slack. > + * > + * Example (with PAGE_SIZE = 4096, PAGE_SHIFT = 12): > + * get_order_no_slack(0x1000) = fls(0) = 0, get_order(0x1000) = 0 > + * get_order_no_slack(0x1001) = fls(0) = 0, get_order(0x1001) = 1 > + * get_order_no_slack(0x1fff) = fls(0) = 0, get_order(0x1fff) = 1 > + * get_order_no_slack(0x2000) = fls(1) = 1, get_order(0x2000) = 1 > + * get_order_no_slack(0x3000) = fls(1) = 1, get_order(0x3000) = 2 > + * get_order_no_slack(0x4000) = fls(2) = 2, get_order(0x4000) = 2 > + * get_order_no_slack(0x8000) = fls(4) = 3, get_order(0x8000) = 3 > + */ > + length >>= PAGE_SHIFT + 1; > + return fls(length); > +} > + > static int > transport_generic_get_mem(struct se_cmd *cmd) > { > u32 length = cmd->data_length; > + u32 page_len; > unsigned int nents; > struct page *page; > gfp_t zero_flag; > int i = 0; > + int order, max_order; > > nents = DIV_ROUND_UP(length, PAGE_SIZE); > cmd->t_data_sg = kmalloc(sizeof(struct scatterlist) * nents, GFP_KERNEL); > if (!cmd->t_data_sg) > return -ENOMEM; > > - cmd->t_data_nents = nents; > sg_init_table(cmd->t_data_sg, nents); > > zero_flag = cmd->se_cmd_flags & SCF_SCSI_DATA_CDB ? 0 : __GFP_ZERO; > > + max_order = TRANSPORT_MAX_ORDER; Please go ahead and turn max_order into an DEF_DEV_ATTRIB within target_core_configfs.c, so we're able to change this value on the fly using an per device configfs attr. > while (length) { > - u32 page_len = min_t(u32, length, PAGE_SIZE); > - page = alloc_page(GFP_KERNEL | zero_flag); > - if (!page) > - goto out; > + order = get_order_no_slack(length); > + for (;;) { > + order = min(order, max_order); > + page_len = min_t(u32, length, PAGE_SIZE << order); > + page = alloc_pages(GFP_KERNEL | zero_flag, order); > + if (page) > + break; > + > + if (!order) > + goto out; > + > + /* > + * Allocation failed, only try with a smaller order > + * from this point. > + */ > + max_order = order - 1; > + } Seems reasonable to back-off in reverse max_order in the face of large order allocation failures.. > > sg_set_page(&cmd->t_data_sg[i], page, page_len, 0); > length -= page_len; > i++; > } > + cmd->t_data_nents = i; > + sg_mark_end(&cmd->t_data_sg[i - 1]); > return 0; > > out: > - while (i > 0) { > - i--; > - __free_page(sg_page(&cmd->t_data_sg[i])); > + while (i-- > 0) { > + struct scatterlist *sg = &cmd->t_data_sg[i]; > + __free_pages(sg_page(sg), get_order(sg->length)); > } > kfree(cmd->t_data_sg); > cmd->t_data_sg = NULL; So what I'd like to see for this patch in the short term is a new target_core_fabrics_op bit that is (by default) assuming single order SGL page allocations, that can optionally be enabled for iscsi-target + other fabrics that support it as we bring other multi-page SGLs fabric drivers online. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations 2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini ` (2 preceding siblings ...) 2012-09-05 15:13 ` [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks Paolo Bonzini @ 2012-09-06 1:58 ` Nicholas A. Bellinger 2012-09-06 9:04 ` Paolo Bonzini 3 siblings, 1 reply; 10+ messages in thread From: Nicholas A. Bellinger @ 2012-09-06 1:58 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, target-devel, Christoph Hellwig On Wed, 2012-09-05 at 17:13 +0200, Paolo Bonzini wrote: > Hi all, > > while testing PSCSI I noticed that even requests for a smallish amount > of data (approximately 700 KB) failed due to an excessive number of > segments in the request. In fact, using alloc_page resulted in a > completely fragmented request, with no merging of consecutive pages > at all. > > This patch series fixes this problem by using higher-order allocations > to build the data scatterlist. The problem is that iscsi assumes that the > scatterlist consists of single pages, which is not true anymore. So > patch 2 has to introduce some relatively complicated changes to > iscsi_map_iovec and iscsi_unmap_iovec. > So enabling multi-page per SGL support is a feature that has been dormant within target core for a long time. It's about time that we start taking advantage of it again. ;) > While doing this, I noticed something strange in iscsit_do_crypto_hash_sg. > Patch 1 adds a warning about it. > Mmmmm, looks like a separate bug with DataDigest enabled. > The approach may be completely wrong and it needs more testing anyway. > Please review! > Adding my comments inline. Thanks Paolo! --nab > Paolo > > Paolo Bonzini (3): > tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg > tcm_iscsi: support multiple sizes in the scatterlist > target: try satisfying memory requests with contiguous blocks > > drivers/target/iscsi/iscsi_target.c | 106 +++++++++++++++++++++++++----- > drivers/target/iscsi/iscsi_target_core.h | 2 +- > drivers/target/target_core_transport.c | 58 ++++++++++++++--- > 3 files changed, 138 insertions(+), 28 deletions(-) > > -- > To unsubscribe from this list: send the line "unsubscribe target-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations 2012-09-06 1:58 ` [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Nicholas A. Bellinger @ 2012-09-06 9:04 ` Paolo Bonzini 2012-09-06 18:52 ` Nicholas A. Bellinger 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2012-09-06 9:04 UTC (permalink / raw) To: Nicholas A. Bellinger; +Cc: linux-kernel, target-devel, Christoph Hellwig Il 06/09/2012 03:58, Nicholas A. Bellinger ha scritto: >> This patch series fixes this problem by using higher-order allocations >> to build the data scatterlist. The problem is that iscsi assumes that the >> scatterlist consists of single pages, which is not true anymore. So >> patch 2 has to introduce some relatively complicated changes to >> iscsi_map_iovec and iscsi_unmap_iovec. > > So enabling multi-page per SGL support is a feature that has been > dormant within target core for a long time. It's about time that we > start taking advantage of it again. ;) Yeah, I noticed some preparation for it in tcm_fc/tfc_io.c, though too late (they look a lot like my iscsi changes, it would have saved me some time!). While this is obviously not to be taken lightly, I disagree with making this a per-fabric choice. With a properly organized (and bisectable) series, it should be relatively easy to review and to get right. I looked a bit more closely now and there are no changes needed to other targets (actually there is a change needed in tcm_qla2xxx, but the code is currently disabled). There are however changes to transport_kmap_data_sg needed and a few other places. I definitely agree with your other comments, including making max_order a DEF_DEV_ATTRIB. In addition, the default max_order should be capped based on queue_max_sectors(q) if applicable, to avoid hitting this scenario: /* * XXX: if the length the device accepts is shorter than the * length of the S/G list entry this will cause and * endless loop. Better hope no driver uses huge pages. */ Paolo >> While doing this, I noticed something strange in iscsit_do_crypto_hash_sg. >> Patch 1 adds a warning about it. >> > > Mmmmm, looks like a separate bug with DataDigest enabled. > >> The approach may be completely wrong and it needs more testing anyway. >> Please review! >> > > Adding my comments inline. > > Thanks Paolo! > > --nab > >> Paolo >> >> Paolo Bonzini (3): >> tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg >> tcm_iscsi: support multiple sizes in the scatterlist >> target: try satisfying memory requests with contiguous blocks >> >> drivers/target/iscsi/iscsi_target.c | 106 +++++++++++++++++++++++++----- >> drivers/target/iscsi/iscsi_target_core.h | 2 +- >> drivers/target/target_core_transport.c | 58 ++++++++++++++--- >> 3 files changed, 138 insertions(+), 28 deletions(-) >> >> -- >> To unsubscribe from this list: send the line "unsubscribe target-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations 2012-09-06 9:04 ` Paolo Bonzini @ 2012-09-06 18:52 ` Nicholas A. Bellinger 2012-09-06 20:49 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Nicholas A. Bellinger @ 2012-09-06 18:52 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, target-devel, Christoph Hellwig On Thu, 2012-09-06 at 11:04 +0200, Paolo Bonzini wrote: > Il 06/09/2012 03:58, Nicholas A. Bellinger ha scritto: > >> This patch series fixes this problem by using higher-order allocations > >> to build the data scatterlist. The problem is that iscsi assumes that the > >> scatterlist consists of single pages, which is not true anymore. So > >> patch 2 has to introduce some relatively complicated changes to > >> iscsi_map_iovec and iscsi_unmap_iovec. > > > > So enabling multi-page per SGL support is a feature that has been > > dormant within target core for a long time. It's about time that we > > start taking advantage of it again. ;) > > Yeah, I noticed some preparation for it in tcm_fc/tfc_io.c, though too > late (they look a lot like my iscsi changes, it would have saved me some > time!). > > While this is obviously not to be taken lightly, I disagree with making > this a per-fabric choice. With a properly organized (and bisectable) > series, it should be relatively easy to review and to get right. It's a temporary bit that allows us to figure out which fabrics can (safely) be enabled for multi-page SGLs operation for the short term within for-next code. Unless your prepared to commit to fio+writeverify'ing 8x mainline fabric drivers in many different types of fabric dependent I/O combination for high order allocations, I'd still prefer to have some way to disable this optimization in a per fabric basis if we really need too. That way we can just disable a problematic fabric instead of having to revert the whole thing if users run into problems with a specific fabric module late during the cycle. If the other fabric maintainers are OK with enabling this in their code and give their Reviewed-By's + Tested-By's, then I have no problem dropping this extra bit once everything has been converted. > I looked a bit more closely now and there are no changes needed to other > targets (actually there is a change needed in tcm_qla2xxx, but the code > is currently disabled). > > There are however changes to transport_kmap_data_sg needed and a few > other places. > > I definitely agree with your other comments, including making max_order > a DEF_DEV_ATTRIB. In addition, the default max_order should be capped > based on queue_max_sectors(q) if applicable, to avoid hitting this scenario: > > /* > * XXX: if the length the device accepts is shorter than the > * length of the S/G list entry this will cause and > * endless loop. Better hope no driver uses huge pages. > */ > Mmmmm, indeed. Also, I'm not sure that every old SCSI LLD is smart enough to handle high older allocations -> multi-page SGLs either.. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations 2012-09-06 18:52 ` Nicholas A. Bellinger @ 2012-09-06 20:49 ` Paolo Bonzini 0 siblings, 0 replies; 10+ messages in thread From: Paolo Bonzini @ 2012-09-06 20:49 UTC (permalink / raw) To: Nicholas A. Bellinger; +Cc: linux-kernel, target-devel, Christoph Hellwig Il 06/09/2012 20:52, Nicholas A. Bellinger ha scritto: > That way we can just disable a problematic fabric instead of having to > revert the whole thing if users run into problems with a specific fabric > module late during the cycle. If the other fabric maintainers are OK > with enabling this in their code and give their Reviewed-By's + > Tested-By's, then I have no problem dropping this extra bit once > everything has been converted. Fair enough. > Mmmmm, indeed. Also, I'm not sure that every old SCSI LLD is smart > enough to handle high older allocations -> multi-page SGLs either.. If not, they should simply set max_segment_size to 4096. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-09-06 20:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini 2012-09-05 15:13 ` [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg Paolo Bonzini 2012-09-05 15:13 ` [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist Paolo Bonzini 2012-09-06 2:33 ` Nicholas A. Bellinger 2012-09-05 15:13 ` [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks Paolo Bonzini 2012-09-06 2:19 ` Nicholas A. Bellinger 2012-09-06 1:58 ` [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Nicholas A. Bellinger 2012-09-06 9:04 ` Paolo Bonzini 2012-09-06 18:52 ` Nicholas A. Bellinger 2012-09-06 20:49 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).