linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations
@ 2012-09-05 15:13 Paolo Bonzini
  2012-09-05 15:13 ` [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Paolo Bonzini @ 2012-09-05 15:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: target-devel

Hi all,

while testing PSCSI I noticed that even requests for a smallish amount
of data (approximately 700 KB) failed due to an excessive number of
segments in the request.  In fact, using alloc_page resulted in a
completely fragmented request, with no merging of consecutive pages
at all.

This patch series fixes this problem by using higher-order allocations
to build the data scatterlist.  The problem is that iscsi assumes that the
scatterlist consists of single pages, which is not true anymore.  So
patch 2 has to introduce some relatively complicated changes to
iscsi_map_iovec and iscsi_unmap_iovec.

While doing this, I noticed something strange in iscsit_do_crypto_hash_sg.
Patch 1 adds a warning about it.

The approach may be completely wrong and it needs more testing anyway.
Please review!

Paolo

Paolo Bonzini (3):
  tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg
  tcm_iscsi: support multiple sizes in the scatterlist
  target: try satisfying memory requests with contiguous blocks

 drivers/target/iscsi/iscsi_target.c      |  106 +++++++++++++++++++++++++-----
 drivers/target/iscsi/iscsi_target_core.h |    2 +-
 drivers/target/target_core_transport.c   |   58 ++++++++++++++---
 3 files changed, 138 insertions(+), 28 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg
  2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini
@ 2012-09-05 15:13 ` Paolo Bonzini
  2012-09-05 15:13 ` [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2012-09-05 15:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: target-devel

While iscsit_do_crypto_hash_sg does look at cmd->first_data_sg_off
when computing cur_len, it ignores it completely when computing the
hash.  As a result, the first cmd->first_data_sg_off bytes are
included in the hash even though they should not.  Warn if this
happens.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/target/iscsi/iscsi_target.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 97c0f78..224679e 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1135,6 +1135,7 @@ static u32 iscsit_do_crypto_hash_sg(
 
 	sg = cmd->first_data_sg;
 	page_off = cmd->first_data_sg_off;
+	WARN_ON(page_off != sg->offset);
 
 	i = 0;
 	while (data_length) {
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist
  2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini
  2012-09-05 15:13 ` [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg Paolo Bonzini
@ 2012-09-05 15:13 ` Paolo Bonzini
  2012-09-06  2:33   ` Nicholas A. Bellinger
  2012-09-05 15:13 ` [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks Paolo Bonzini
  2012-09-06  1:58 ` [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Nicholas A. Bellinger
  3 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-09-05 15:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: target-devel

The next patch will use multiple orders to satisfy the allocation
of the data buffers.

We need to support this in iscsi_map_iovec and iscsi_unmap_iovec.
The idea here is to walk each relevant page in the scatterlist
(which may not be every page in the scatterlist due to data_offset)
and kmap it.  iscsi_unmap_iovec uses the same algorithm, so that
each kunmap maps the kmap that was used in iscsi_map_iovec.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/target/iscsi/iscsi_target.c      |  106 ++++++++++++++++++++++++-----
 drivers/target/iscsi/iscsi_target_core.h |    2 +-
 2 files changed, 89 insertions(+), 19 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 224679e..dc4f5da 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -684,45 +684,115 @@ static int iscsit_map_iovec(
 	u32 data_offset,
 	u32 data_length)
 {
-	u32 i = 0;
 	struct scatterlist *sg;
-	unsigned int page_off;
+	u32 sg_off, sg_left, cur_len;
+	u32 sg_pfn = 0;
+	u32 i = 0;
 
-	/*
-	 * We know each entry in t_data_sg contains a page.
-	 */
-	sg = &cmd->se_cmd.t_data_sg[data_offset / PAGE_SIZE];
-	page_off = (data_offset % PAGE_SIZE);
+	sg = cmd->se_cmd.t_data_sg;
+	while (data_offset >= sg->length) {
+		data_offset -= sg->length;
+		sg++;
+	}
 
 	cmd->first_data_sg = sg;
-	cmd->first_data_sg_off = page_off;
+	cmd->first_data_sg_off = sg->offset + data_offset;
+	cmd->kmapped_length = data_length;
 
+	sg_left = 0;
 	while (data_length) {
-		u32 cur_len = min_t(u32, data_length, sg->length - page_off);
+		if (!sg_left) {
+			sg_off = sg->offset + data_offset;
+			sg_pfn = page_to_pfn(sg_page(sg));
+			sg_pfn += sg_off >> PAGE_SHIFT;
+			sg_off &= PAGE_SIZE - 1;
+
+			sg_left = min(sg->length, data_length);
+
+			/*
+			 * The next scatterlist is mapped from the beginning.
+			 */
+			sg++;
+			data_offset = 0;
+		}
 
-		iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off;
+		/*
+		 * Create an iovec for (a part of) this page.
+		 */
+		cur_len = min_t(u32, PAGE_SIZE - sg_off, sg_left);
+		BUG_ON(!cur_len);
+
+		iov[i].iov_base = kmap(pfn_to_page(sg_pfn)) + sg_off;
 		iov[i].iov_len = cur_len;
+		i++;
 
 		data_length -= cur_len;
-		page_off = 0;
-		sg = sg_next(sg);
-		i++;
-	}
+		sg_left -= cur_len;
 
-	cmd->kmapped_nents = i;
+		/*
+		 * The next pfn is mapped from the beginning.
+		 */
+		sg_pfn++;
+		sg_off = 0;
+	}
 
+	BUG_ON(sg_left);
 	return i;
 }
 
 static void iscsit_unmap_iovec(struct iscsi_cmd *cmd)
 {
-	u32 i;
 	struct scatterlist *sg;
+	u32 data_offset, data_length;
+	u32 sg_off, sg_left, cur_len;
+	u32 sg_pfn = 0;
+
+	if (!cmd->kmapped_length)
+		return;
 
 	sg = cmd->first_data_sg;
+	data_offset = cmd->first_data_sg_off - sg->offset;
+	data_length = cmd->kmapped_length;
+
+	/*
+	 * This loop must mirror exactly what is done in iscsi_map_iovec.
+	 */
+	sg_left = 0;
+	while (data_length) {
+		if (!sg_left) {
+			sg_off = sg->offset + data_offset;
+			sg_pfn = page_to_pfn(sg_page(sg));
+			sg_pfn += sg_off >> PAGE_SHIFT;
+			sg_off &= PAGE_SIZE - 1;
+
+			sg_left = min(sg->length, data_length);
+
+			/*
+			 * The next scatterlist is mapped from the beginning.
+			 */
+			sg++;
+			data_offset = 0;
+		}
+
+		/*
+		 * Create an iovec for (a part of) this page.
+		 */
+		cur_len = min_t(u32, PAGE_SIZE - sg_off, sg_left);
+		BUG_ON(!cur_len);
+
+		kunmap(pfn_to_page(sg_pfn));
+		data_length -= cur_len;
+		sg_left -= cur_len;
+
+		/*
+		 * The next pfn is mapped from the beginning.
+		 */
+		sg_pfn++;
+		sg_off = 0;
+	}
 
-	for (i = 0; i < cmd->kmapped_nents; i++)
-		kunmap(sg_page(&sg[i]));
+	BUG_ON(sg_left);
+	cmd->kmapped_length = 0;
 }
 
 static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn)
diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h
index 8a908b2..13bdaf1 100644
--- a/drivers/target/iscsi/iscsi_target_core.h
+++ b/drivers/target/iscsi/iscsi_target_core.h
@@ -472,7 +472,7 @@ struct iscsi_cmd {
 
 	struct scatterlist	*first_data_sg;
 	u32			first_data_sg_off;
-	u32			kmapped_nents;
+	u32			kmapped_length;
 
 }  ____cacheline_aligned;
 
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks
  2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini
  2012-09-05 15:13 ` [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg Paolo Bonzini
  2012-09-05 15:13 ` [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist Paolo Bonzini
@ 2012-09-05 15:13 ` Paolo Bonzini
  2012-09-06  2:19   ` Nicholas A. Bellinger
  2012-09-06  1:58 ` [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Nicholas A. Bellinger
  3 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-09-05 15:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: target-devel

transport_generic_get_mem's strategy of allocating pages one-by-one
for the data scatterlist may fail for even not-so-big data transfers
if the underlying device poses a small limit on the number of segments.

This patch fixes this problem by trying to allocate pages in relatively
large groups (at most 512 pages right now, equivalent to 2 MB per
allocation), but still without any slack.  The remainder is then
satisfied with subsequent smaller allocations.  For example, if the
data to be transferred is 132 KB, the new algorithm will attempt
a single 128 KB allocation, followed by a 4 KB allocation.
In case a contiguous block cannot be found, it will fall back to
a progressively smaller order.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/target/target_core_transport.c |   58 +++++++++++++++++++++++++++-----
 1 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 09028af..6e90e2c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2058,7 +2058,7 @@ static inline void transport_free_sgl(struct scatterlist *sgl, int nents)
 	int count;
 
 	for_each_sg(sgl, sg, nents, count)
-		__free_page(sg_page(sg));
+		__free_pages(sg_page(sg), get_order(sg->length));
 
 	kfree(sgl);
 }
@@ -2229,41 +2229,81 @@ void transport_kunmap_data_sg(struct se_cmd *cmd)
 }
 EXPORT_SYMBOL(transport_kunmap_data_sg);
 
+#define TRANSPORT_MAX_ORDER	9
+
+static inline int
+get_order_no_slack(u32 length)
+{
+	/*
+	 * get_order gives the smallest order that length fits in.
+	 * We want instead the largest order that does not give
+	 * any slack.
+	 *
+	 * Example (with PAGE_SIZE = 4096, PAGE_SHIFT = 12):
+	 * 	get_order_no_slack(0x1000) = fls(0) = 0, get_order(0x1000) = 0
+	 * 	get_order_no_slack(0x1001) = fls(0) = 0, get_order(0x1001) = 1
+	 * 	get_order_no_slack(0x1fff) = fls(0) = 0, get_order(0x1fff) = 1
+	 * 	get_order_no_slack(0x2000) = fls(1) = 1, get_order(0x2000) = 1
+	 * 	get_order_no_slack(0x3000) = fls(1) = 1, get_order(0x3000) = 2
+	 * 	get_order_no_slack(0x4000) = fls(2) = 2, get_order(0x4000) = 2
+	 * 	get_order_no_slack(0x8000) = fls(4) = 3, get_order(0x8000) = 3
+	 */
+	length >>= PAGE_SHIFT + 1;
+	return fls(length);
+}
+
 static int
 transport_generic_get_mem(struct se_cmd *cmd)
 {
 	u32 length = cmd->data_length;
+	u32 page_len;
 	unsigned int nents;
 	struct page *page;
 	gfp_t zero_flag;
 	int i = 0;
+	int order, max_order;
 
 	nents = DIV_ROUND_UP(length, PAGE_SIZE);
 	cmd->t_data_sg = kmalloc(sizeof(struct scatterlist) * nents, GFP_KERNEL);
 	if (!cmd->t_data_sg)
 		return -ENOMEM;
 
-	cmd->t_data_nents = nents;
 	sg_init_table(cmd->t_data_sg, nents);
 
 	zero_flag = cmd->se_cmd_flags & SCF_SCSI_DATA_CDB ? 0 : __GFP_ZERO;
 
+	max_order = TRANSPORT_MAX_ORDER;
 	while (length) {
-		u32 page_len = min_t(u32, length, PAGE_SIZE);
-		page = alloc_page(GFP_KERNEL | zero_flag);
-		if (!page)
-			goto out;
+		order = get_order_no_slack(length);
+		for (;;) {
+			order = min(order, max_order);
+			page_len = min_t(u32, length, PAGE_SIZE << order);
+			page = alloc_pages(GFP_KERNEL | zero_flag, order);
+			if (page)
+				break;
+
+			if (!order)
+				goto out;
+
+			/*
+			 * Allocation failed, only try with a smaller order
+			 * from this point.
+			 */
+			max_order = order - 1;
+		}
 
 		sg_set_page(&cmd->t_data_sg[i], page, page_len, 0);
 		length -= page_len;
 		i++;
 	}
+	cmd->t_data_nents = i;
+	sg_mark_end(&cmd->t_data_sg[i - 1]);
 	return 0;
 
 out:
-	while (i > 0) {
-		i--;
-		__free_page(sg_page(&cmd->t_data_sg[i]));
+	while (i-- > 0) {
+		struct scatterlist *sg = &cmd->t_data_sg[i];
+		__free_pages(sg_page(sg), get_order(sg->length));
 	}
 	kfree(cmd->t_data_sg);
 	cmd->t_data_sg = NULL;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations
  2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini
                   ` (2 preceding siblings ...)
  2012-09-05 15:13 ` [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks Paolo Bonzini
@ 2012-09-06  1:58 ` Nicholas A. Bellinger
  2012-09-06  9:04   ` Paolo Bonzini
  3 siblings, 1 reply; 10+ messages in thread
From: Nicholas A. Bellinger @ 2012-09-06  1:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, target-devel, Christoph Hellwig

On Wed, 2012-09-05 at 17:13 +0200, Paolo Bonzini wrote:
> Hi all,
> 
> while testing PSCSI I noticed that even requests for a smallish amount
> of data (approximately 700 KB) failed due to an excessive number of
> segments in the request.  In fact, using alloc_page resulted in a
> completely fragmented request, with no merging of consecutive pages
> at all.
> 
> This patch series fixes this problem by using higher-order allocations
> to build the data scatterlist.  The problem is that iscsi assumes that the
> scatterlist consists of single pages, which is not true anymore.  So
> patch 2 has to introduce some relatively complicated changes to
> iscsi_map_iovec and iscsi_unmap_iovec.
> 

So enabling multi-page per SGL support is a feature that has been
dormant within target core for a long time.  It's about time that we
start taking advantage of it again.  ;)

> While doing this, I noticed something strange in iscsit_do_crypto_hash_sg.
> Patch 1 adds a warning about it.
> 

Mmmmm, looks like a separate bug with DataDigest enabled.  

> The approach may be completely wrong and it needs more testing anyway.
> Please review!
> 

Adding my comments inline.

Thanks Paolo!

--nab

> Paolo
> 
> Paolo Bonzini (3):
>   tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg
>   tcm_iscsi: support multiple sizes in the scatterlist
>   target: try satisfying memory requests with contiguous blocks
> 
>  drivers/target/iscsi/iscsi_target.c      |  106 +++++++++++++++++++++++++-----
>  drivers/target/iscsi/iscsi_target_core.h |    2 +-
>  drivers/target/target_core_transport.c   |   58 ++++++++++++++---
>  3 files changed, 138 insertions(+), 28 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks
  2012-09-05 15:13 ` [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks Paolo Bonzini
@ 2012-09-06  2:19   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 10+ messages in thread
From: Nicholas A. Bellinger @ 2012-09-06  2:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, target-devel, Christoph Hellwig

On Wed, 2012-09-05 at 17:13 +0200, Paolo Bonzini wrote:
> transport_generic_get_mem's strategy of allocating pages one-by-one
> for the data scatterlist may fail for even not-so-big data transfers
> if the underlying device poses a small limit on the number of segments.
> 
> This patch fixes this problem by trying to allocate pages in relatively
> large groups (at most 512 pages right now, equivalent to 2 MB per
> allocation), but still without any slack.  The remainder is then
> satisfied with subsequent smaller allocations.  For example, if the
> data to be transferred is 132 KB, the new algorithm will attempt
> a single 128 KB allocation, followed by a 4 KB allocation.
> In case a contiguous block cannot be found, it will fall back to
> a progressively smaller order.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  drivers/target/target_core_transport.c |   58 +++++++++++++++++++++++++++-----
>  1 files changed, 49 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 09028af..6e90e2c 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2058,7 +2058,7 @@ static inline void transport_free_sgl(struct scatterlist *sgl, int nents)
>  	int count;
>  
>  	for_each_sg(sgl, sg, nents, count)
> -		__free_page(sg_page(sg));
> +		__free_pages(sg_page(sg), get_order(sg->length));
>  
>  	kfree(sgl);
>  }
> @@ -2229,41 +2229,81 @@ void transport_kunmap_data_sg(struct se_cmd *cmd)
>  }
>  EXPORT_SYMBOL(transport_kunmap_data_sg);
>  
> +#define TRANSPORT_MAX_ORDER	9
> +
> +static inline int
> +get_order_no_slack(u32 length)
> +{
> +	/*
> +	 * get_order gives the smallest order that length fits in.
> +	 * We want instead the largest order that does not give
> +	 * any slack.
> +	 *
> +	 * Example (with PAGE_SIZE = 4096, PAGE_SHIFT = 12):
> +	 * 	get_order_no_slack(0x1000) = fls(0) = 0, get_order(0x1000) = 0
> +	 * 	get_order_no_slack(0x1001) = fls(0) = 0, get_order(0x1001) = 1
> +	 * 	get_order_no_slack(0x1fff) = fls(0) = 0, get_order(0x1fff) = 1
> +	 * 	get_order_no_slack(0x2000) = fls(1) = 1, get_order(0x2000) = 1
> +	 * 	get_order_no_slack(0x3000) = fls(1) = 1, get_order(0x3000) = 2
> +	 * 	get_order_no_slack(0x4000) = fls(2) = 2, get_order(0x4000) = 2
> +	 * 	get_order_no_slack(0x8000) = fls(4) = 3, get_order(0x8000) = 3
> +	 */
> +	length >>= PAGE_SHIFT + 1;
> +	return fls(length);
> +}
> +
>  static int
>  transport_generic_get_mem(struct se_cmd *cmd)
>  {
>  	u32 length = cmd->data_length;
> +	u32 page_len;
>  	unsigned int nents;
>  	struct page *page;
>  	gfp_t zero_flag;
>  	int i = 0;
> +	int order, max_order;
>  
>  	nents = DIV_ROUND_UP(length, PAGE_SIZE);
>  	cmd->t_data_sg = kmalloc(sizeof(struct scatterlist) * nents, GFP_KERNEL);
>  	if (!cmd->t_data_sg)
>  		return -ENOMEM;
>  
> -	cmd->t_data_nents = nents;
>  	sg_init_table(cmd->t_data_sg, nents);
>  
>  	zero_flag = cmd->se_cmd_flags & SCF_SCSI_DATA_CDB ? 0 : __GFP_ZERO;
>  
> +	max_order = TRANSPORT_MAX_ORDER;

Please go ahead and turn max_order into an DEF_DEV_ATTRIB within
target_core_configfs.c, so we're able to change this value on the fly
using an per device configfs attr.

>  	while (length) {
> -		u32 page_len = min_t(u32, length, PAGE_SIZE);
> -		page = alloc_page(GFP_KERNEL | zero_flag);
> -		if (!page)
> -			goto out;
> +		order = get_order_no_slack(length);
> +		for (;;) {
> +			order = min(order, max_order);
> +			page_len = min_t(u32, length, PAGE_SIZE << order);
> +			page = alloc_pages(GFP_KERNEL | zero_flag, order);
> +			if (page)
> +				break;
> +
> +			if (!order)
> +				goto out;
> +
> +			/*
> +			 * Allocation failed, only try with a smaller order
> +			 * from this point.
> +			 */
> +			max_order = order - 1;
> +		}

Seems reasonable to back-off in reverse max_order in the face of large
order allocation failures..

>  
>  		sg_set_page(&cmd->t_data_sg[i], page, page_len, 0);
>  		length -= page_len;
>  		i++;
>  	}
> +	cmd->t_data_nents = i;
> +	sg_mark_end(&cmd->t_data_sg[i - 1]);
>  	return 0;
>  
>  out:
> -	while (i > 0) {
> -		i--;
> -		__free_page(sg_page(&cmd->t_data_sg[i]));
> +	while (i-- > 0) {
> +		struct scatterlist *sg = &cmd->t_data_sg[i];
> +		__free_pages(sg_page(sg), get_order(sg->length));
>  	}
>  	kfree(cmd->t_data_sg);
>  	cmd->t_data_sg = NULL;

So what I'd like to see for this patch in the short term is a new
target_core_fabrics_op bit that is (by default) assuming single order
SGL page allocations, that can optionally be enabled for iscsi-target +
other fabrics that support it as we bring other multi-page SGLs fabric
drivers online.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist
  2012-09-05 15:13 ` [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist Paolo Bonzini
@ 2012-09-06  2:33   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 10+ messages in thread
From: Nicholas A. Bellinger @ 2012-09-06  2:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, target-devel, Andy Grover, Christoph Hellwig,
	Roland Dreier

On Wed, 2012-09-05 at 17:13 +0200, Paolo Bonzini wrote:
> The next patch will use multiple orders to satisfy the allocation
> of the data buffers.
> 
> We need to support this in iscsi_map_iovec and iscsi_unmap_iovec.
> The idea here is to walk each relevant page in the scatterlist
> (which may not be every page in the scatterlist due to data_offset)
> and kmap it.  iscsi_unmap_iovec uses the same algorithm, so that
> each kunmap maps the kmap that was used in iscsi_map_iovec.
> 

CC'ing Agrover here as he last touched the iscsit_[map,unmap]_iovec()
before the mainline merge of this code..

Also, this type of change is going to require other Reviewed-By's and
serious fio+writeverify Tested-By's to make sure we get all of the
corner cases as this enabled for upstream iscsi-target.

I'll be having a deeper look sometime next week, but as long as we can
selectively disable this for fabrics I don't have a problem considering
it for-3.7 code.

Nice work Paolo!

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  drivers/target/iscsi/iscsi_target.c      |  106 ++++++++++++++++++++++++-----
>  drivers/target/iscsi/iscsi_target_core.h |    2 +-
>  2 files changed, 89 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> index 224679e..dc4f5da 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -684,45 +684,115 @@ static int iscsit_map_iovec(
>  	u32 data_offset,
>  	u32 data_length)
>  {
> -	u32 i = 0;
>  	struct scatterlist *sg;
> -	unsigned int page_off;
> +	u32 sg_off, sg_left, cur_len;
> +	u32 sg_pfn = 0;
> +	u32 i = 0;
>  
> -	/*
> -	 * We know each entry in t_data_sg contains a page.
> -	 */
> -	sg = &cmd->se_cmd.t_data_sg[data_offset / PAGE_SIZE];
> -	page_off = (data_offset % PAGE_SIZE);
> +	sg = cmd->se_cmd.t_data_sg;
> +	while (data_offset >= sg->length) {
> +		data_offset -= sg->length;
> +		sg++;
> +	}
>  
>  	cmd->first_data_sg = sg;
> -	cmd->first_data_sg_off = page_off;
> +	cmd->first_data_sg_off = sg->offset + data_offset;
> +	cmd->kmapped_length = data_length;
>  
> +	sg_left = 0;
>  	while (data_length) {
> -		u32 cur_len = min_t(u32, data_length, sg->length - page_off);
> +		if (!sg_left) {
> +			sg_off = sg->offset + data_offset;
> +			sg_pfn = page_to_pfn(sg_page(sg));
> +			sg_pfn += sg_off >> PAGE_SHIFT;
> +			sg_off &= PAGE_SIZE - 1;
> +
> +			sg_left = min(sg->length, data_length);
> +
> +			/*
> +			 * The next scatterlist is mapped from the beginning.
> +			 */
> +			sg++;
> +			data_offset = 0;
> +		}
>  
> -		iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off;
> +		/*
> +		 * Create an iovec for (a part of) this page.
> +		 */
> +		cur_len = min_t(u32, PAGE_SIZE - sg_off, sg_left);
> +		BUG_ON(!cur_len);
> +
> +		iov[i].iov_base = kmap(pfn_to_page(sg_pfn)) + sg_off;
>  		iov[i].iov_len = cur_len;
> +		i++;
>  
>  		data_length -= cur_len;
> -		page_off = 0;
> -		sg = sg_next(sg);
> -		i++;
> -	}
> +		sg_left -= cur_len;
>  
> -	cmd->kmapped_nents = i;
> +		/*
> +		 * The next pfn is mapped from the beginning.
> +		 */
> +		sg_pfn++;
> +		sg_off = 0;
> +	}
>  
> +	BUG_ON(sg_left);
>  	return i;
>  }
>  
>  static void iscsit_unmap_iovec(struct iscsi_cmd *cmd)
>  {
> -	u32 i;
>  	struct scatterlist *sg;
> +	u32 data_offset, data_length;
> +	u32 sg_off, sg_left, cur_len;
> +	u32 sg_pfn = 0;
> +
> +	if (!cmd->kmapped_length)
> +		return;
>  
>  	sg = cmd->first_data_sg;
> +	data_offset = cmd->first_data_sg_off - sg->offset;
> +	data_length = cmd->kmapped_length;
> +
> +	/*
> +	 * This loop must mirror exactly what is done in iscsi_map_iovec.
> +	 */
> +	sg_left = 0;
> +	while (data_length) {
> +		if (!sg_left) {
> +			sg_off = sg->offset + data_offset;
> +			sg_pfn = page_to_pfn(sg_page(sg));
> +			sg_pfn += sg_off >> PAGE_SHIFT;
> +			sg_off &= PAGE_SIZE - 1;
> +
> +			sg_left = min(sg->length, data_length);
> +
> +			/*
> +			 * The next scatterlist is mapped from the beginning.
> +			 */
> +			sg++;
> +			data_offset = 0;
> +		}
> +
> +		/*
> +		 * Create an iovec for (a part of) this page.
> +		 */
> +		cur_len = min_t(u32, PAGE_SIZE - sg_off, sg_left);
> +		BUG_ON(!cur_len);
> +
> +		kunmap(pfn_to_page(sg_pfn));
> +		data_length -= cur_len;
> +		sg_left -= cur_len;
> +
> +		/*
> +		 * The next pfn is mapped from the beginning.
> +		 */
> +		sg_pfn++;
> +		sg_off = 0;
> +	}
>  
> -	for (i = 0; i < cmd->kmapped_nents; i++)
> -		kunmap(sg_page(&sg[i]));
> +	BUG_ON(sg_left);
> +	cmd->kmapped_length = 0;
>  }
>  
>  static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn)
> diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h
> index 8a908b2..13bdaf1 100644
> --- a/drivers/target/iscsi/iscsi_target_core.h
> +++ b/drivers/target/iscsi/iscsi_target_core.h
> @@ -472,7 +472,7 @@ struct iscsi_cmd {
>  
>  	struct scatterlist	*first_data_sg;
>  	u32			first_data_sg_off;
> -	u32			kmapped_nents;
> +	u32			kmapped_length;
>  
>  }  ____cacheline_aligned;
>  



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations
  2012-09-06  1:58 ` [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Nicholas A. Bellinger
@ 2012-09-06  9:04   ` Paolo Bonzini
  2012-09-06 18:52     ` Nicholas A. Bellinger
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2012-09-06  9:04 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-kernel, target-devel, Christoph Hellwig

Il 06/09/2012 03:58, Nicholas A. Bellinger ha scritto:
>> This patch series fixes this problem by using higher-order allocations
>> to build the data scatterlist.  The problem is that iscsi assumes that the
>> scatterlist consists of single pages, which is not true anymore.  So
>> patch 2 has to introduce some relatively complicated changes to
>> iscsi_map_iovec and iscsi_unmap_iovec.
> 
> So enabling multi-page per SGL support is a feature that has been
> dormant within target core for a long time.  It's about time that we
> start taking advantage of it again.  ;)

Yeah, I noticed some preparation for it in tcm_fc/tfc_io.c, though too
late (they look a lot like my iscsi changes, it would have saved me some
time!).

While this is obviously not to be taken lightly, I disagree with making
this a per-fabric choice.  With a properly organized (and bisectable)
series, it should be relatively easy to review and to get right.  I
looked a bit more closely now and there are no changes needed to other
targets (actually there is a change needed in tcm_qla2xxx, but the code
is currently disabled).

There are however changes to transport_kmap_data_sg needed and a few
other places.

I definitely agree with your other comments, including making max_order
a DEF_DEV_ATTRIB.  In addition, the default max_order should be capped
based on queue_max_sectors(q) if applicable, to avoid hitting this scenario:

       /*
        * XXX: if the length the device accepts is shorter than the
        *      length of the S/G list entry this will cause and
        *      endless loop.  Better hope no driver uses huge pages.
        */

Paolo

>> While doing this, I noticed something strange in iscsit_do_crypto_hash_sg.
>> Patch 1 adds a warning about it.
>>
> 
> Mmmmm, looks like a separate bug with DataDigest enabled.  
> 
>> The approach may be completely wrong and it needs more testing anyway.
>> Please review!
>>
> 
> Adding my comments inline.
> 
> Thanks Paolo!
> 
> --nab
> 
>> Paolo
>>
>> Paolo Bonzini (3):
>>   tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg
>>   tcm_iscsi: support multiple sizes in the scatterlist
>>   target: try satisfying memory requests with contiguous blocks
>>
>>  drivers/target/iscsi/iscsi_target.c      |  106 +++++++++++++++++++++++++-----
>>  drivers/target/iscsi/iscsi_target_core.h |    2 +-
>>  drivers/target/target_core_transport.c   |   58 ++++++++++++++---
>>  3 files changed, 138 insertions(+), 28 deletions(-)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe target-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations
  2012-09-06  9:04   ` Paolo Bonzini
@ 2012-09-06 18:52     ` Nicholas A. Bellinger
  2012-09-06 20:49       ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Nicholas A. Bellinger @ 2012-09-06 18:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, target-devel, Christoph Hellwig

On Thu, 2012-09-06 at 11:04 +0200, Paolo Bonzini wrote:
> Il 06/09/2012 03:58, Nicholas A. Bellinger ha scritto:
> >> This patch series fixes this problem by using higher-order allocations
> >> to build the data scatterlist.  The problem is that iscsi assumes that the
> >> scatterlist consists of single pages, which is not true anymore.  So
> >> patch 2 has to introduce some relatively complicated changes to
> >> iscsi_map_iovec and iscsi_unmap_iovec.
> > 
> > So enabling multi-page per SGL support is a feature that has been
> > dormant within target core for a long time.  It's about time that we
> > start taking advantage of it again.  ;)
> 
> Yeah, I noticed some preparation for it in tcm_fc/tfc_io.c, though too
> late (they look a lot like my iscsi changes, it would have saved me some
> time!).
> 
> While this is obviously not to be taken lightly, I disagree with making
> this a per-fabric choice.  With a properly organized (and bisectable)
> series, it should be relatively easy to review and to get right.

It's a temporary bit that allows us to figure out which fabrics can
(safely) be enabled for multi-page SGLs operation for the short term
within for-next code.

Unless your prepared to commit to fio+writeverify'ing 8x mainline fabric
drivers in many different types of fabric dependent I/O combination for
high order allocations, I'd still prefer to have some way to disable
this optimization in a per fabric basis if we really need too.

That way we can just disable a problematic fabric instead of having to
revert the whole thing if users run into problems with a specific fabric
module late during the cycle.  If the other fabric maintainers are OK
with enabling this in their code and give their Reviewed-By's +
Tested-By's, then I have no problem dropping this extra bit once
everything has been converted.

>  I looked a bit more closely now and there are no changes needed to other
> targets (actually there is a change needed in tcm_qla2xxx, but the code
> is currently disabled).
> 
> There are however changes to transport_kmap_data_sg needed and a few
> other places.
> 
> I definitely agree with your other comments, including making max_order
> a DEF_DEV_ATTRIB.  In addition, the default max_order should be capped
> based on queue_max_sectors(q) if applicable, to avoid hitting this scenario:
> 
>        /*
>         * XXX: if the length the device accepts is shorter than the
>         *      length of the S/G list entry this will cause and
>         *      endless loop.  Better hope no driver uses huge pages.
>         */
> 

Mmmmm, indeed.  Also, I'm not sure that every old SCSI LLD is smart
enough to handle high older allocations -> multi-page SGLs either..


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations
  2012-09-06 18:52     ` Nicholas A. Bellinger
@ 2012-09-06 20:49       ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2012-09-06 20:49 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: linux-kernel, target-devel, Christoph Hellwig

Il 06/09/2012 20:52, Nicholas A. Bellinger ha scritto:
> That way we can just disable a problematic fabric instead of having to
> revert the whole thing if users run into problems with a specific fabric
> module late during the cycle.  If the other fabric maintainers are OK
> with enabling this in their code and give their Reviewed-By's +
> Tested-By's, then I have no problem dropping this extra bit once
> everything has been converted.

Fair enough.

> Mmmmm, indeed.  Also, I'm not sure that every old SCSI LLD is smart
> enough to handle high older allocations -> multi-page SGLs either..

If not, they should simply set max_segment_size to 4096.

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-09-06 20:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-05 15:13 [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Paolo Bonzini
2012-09-05 15:13 ` [RFC PATCH 1/3] tcm_iscsi: warn on incorrect precondition for iscsit_do_crypto_hash_sg Paolo Bonzini
2012-09-05 15:13 ` [RFC PATCH 2/3] tcm_iscsi: support multiple sizes in the scatterlist Paolo Bonzini
2012-09-06  2:33   ` Nicholas A. Bellinger
2012-09-05 15:13 ` [RFC PATCH 3/3] target: try satisfying memory requests with contiguous blocks Paolo Bonzini
2012-09-06  2:19   ` Nicholas A. Bellinger
2012-09-06  1:58 ` [RFC PATCH 0/3] target: try satisfying memory requests with higher-order allocations Nicholas A. Bellinger
2012-09-06  9:04   ` Paolo Bonzini
2012-09-06 18:52     ` Nicholas A. Bellinger
2012-09-06 20:49       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).