All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] btrfs-progs: fi usage: implement raid56
@ 2019-03-17 12:51 stephane_btrfs
  2019-03-17 12:51 ` [PATCH 1/1] " stephane_btrfs
  0 siblings, 1 reply; 8+ messages in thread
From: stephane_btrfs @ 2019-03-17 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Stéphane Lesimple

From: Stéphane Lesimple <stephane_btrfs@lesimple.fr>

For several years now, we've been missing RAID56
code in the `filesystem usage' btrfs subcommand.

This is my try at implementing it. I don't have
any production array using raid5 or raid6 yet, but
I've tried different combinations of devloop test
filesystems with different number of devices with
different sizes. And the reported informations
always seem correct.

Comments are obviously welcome.

The diff can also be seen at:
https://github.com/kdave/btrfs-progs/compare/devel...speed47:raid56

Stéphane Lesimple (1):
  btrfs-progs: fi usage: implement raid56

 cmds-fi-usage.c | 96 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 60 insertions(+), 36 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] btrfs-progs: fi usage: implement raid56
  2019-03-17 12:51 [PATCH 0/1] btrfs-progs: fi usage: implement raid56 stephane_btrfs
@ 2019-03-17 12:51 ` stephane_btrfs
  2019-03-17 18:41   ` Goffredo Baroncelli
  2019-03-17 23:23   ` Hans van Kranenburg
  0 siblings, 2 replies; 8+ messages in thread
From: stephane_btrfs @ 2019-03-17 12:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Stéphane Lesimple

From: Stéphane Lesimple <stephane_btrfs@lesimple.fr>

Implement RAID5 and RAID6 support in `filesystem usage'.

Before:
> WARNING: RAID56 detected, not implemented
> WARNING: RAID56 detected, not implemented
> WARNING: RAID56 detected, not implemented
>      Device allocated:     0.00B
>      Device unallocated:   2.04GiB
>      Used:                 0.00B
>      Free (estimated):     0.00B      (min: 8.00EiB)
>      Data ratio:           0.00
>      Metadata ratio:       0.00

After:
>      Device allocated:     1.83GiB
>      Device unallocated:   211.50MiB
>      Used:                 1.83GiB
>      Free (estimated):     128.44MiB  (min: 128.44MiB)
>      Data ratio:           1.65
>      Metadata ratio:       1.20

Note that this might need some fine-tuning for mixed mode.

We may also overestimate the free space incorrectly in
some complex disk configurations, i.e. when some of the
unallocated space is in fact unallocatable because the
redundancy profile requisites can't be matched with how
the remaining unallocated space is distributed over the
devices. This is not specific to raid56 and also happens
with other raid profiles, it'll need to be taken care of
separately.

Signed-off-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
---
 cmds-fi-usage.c | 96 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 60 insertions(+), 36 deletions(-)

diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index 9a23e176..baa1ff89 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -280,28 +280,6 @@ static struct btrfs_ioctl_space_args *load_space_info(int fd, const char *path)
 	return sargs;
 }
 
-/*
- * This function computes the space occupied by a *single* RAID5/RAID6 chunk.
- * The computation is performed on the basis of the number of stripes
- * which compose the chunk, which could be different from the number of devices
- * if a disk is added later.
- */
-static void get_raid56_used(struct chunk_info *chunks, int chunkcount,
-		u64 *raid5_used, u64 *raid6_used)
-{
-	struct chunk_info *info_ptr = chunks;
-	*raid5_used = 0;
-	*raid6_used = 0;
-
-	while (chunkcount-- > 0) {
-		if (info_ptr->type & BTRFS_BLOCK_GROUP_RAID5)
-			(*raid5_used) += info_ptr->size / (info_ptr->num_stripes - 1);
-		if (info_ptr->type & BTRFS_BLOCK_GROUP_RAID6)
-			(*raid6_used) += info_ptr->size / (info_ptr->num_stripes - 2);
-		info_ptr++;
-	}
-}
-
 #define	MIN_UNALOCATED_THRESH	SZ_16M
 static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
 		int chunkcount, struct device_info *devinfo, int devcount,
@@ -331,13 +309,15 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
 	double data_ratio;
 	double metadata_ratio;
 	/* logical */
-	u64 raid5_used = 0;
-	u64 raid6_used = 0;
+	u64 r_data_raid56_chunks = 0;
+	u64 l_data_raid56_chunks = 0;
+	u64 r_metadata_raid56_chunks = 0;
+	u64 l_metadata_raid56_chunks = 0;
 	u64 l_global_reserve = 0;
 	u64 l_global_reserve_used = 0;
 	u64 free_estimated = 0;
 	u64 free_min = 0;
-	int max_data_ratio = 1;
+	double max_data_ratio = 1;
 	int mixed = 0;
 
 	sargs = load_space_info(fd, path);
@@ -359,24 +339,29 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
 		ret = 1;
 		goto exit;
 	}
-	get_raid56_used(chunkinfo, chunkcount, &raid5_used, &raid6_used);
 
 	for (i = 0; i < sargs->total_spaces; i++) {
 		int ratio;
 		u64 flags = sargs->spaces[i].flags;
 
+		if ((flags & (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA))
+		    == (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA)) {
+			mixed = 1;
+		}
+
 		/*
 		 * The raid5/raid6 ratio depends by the stripes number
-		 * used by every chunk. It is computed separately
+		 * used by every chunk. It is computed separately,
+		 * after we're done with this loop, so skip those.
 		 */
 		if (flags & BTRFS_BLOCK_GROUP_RAID0)
 			ratio = 1;
 		else if (flags & BTRFS_BLOCK_GROUP_RAID1)
 			ratio = 2;
 		else if (flags & BTRFS_BLOCK_GROUP_RAID5)
-			ratio = 0;
+			continue;
 		else if (flags & BTRFS_BLOCK_GROUP_RAID6)
-			ratio = 0;
+			continue;
 		else if (flags & BTRFS_BLOCK_GROUP_DUP)
 			ratio = 2;
 		else if (flags & BTRFS_BLOCK_GROUP_RAID10)
@@ -384,9 +369,6 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
 		else
 			ratio = 1;
 
-		if (!ratio)
-			warning("RAID56 detected, not implemented");
-
 		if (ratio > max_data_ratio)
 			max_data_ratio = ratio;
 
@@ -394,10 +376,6 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
 			l_global_reserve = sargs->spaces[i].total_bytes;
 			l_global_reserve_used = sargs->spaces[i].used_bytes;
 		}
-		if ((flags & (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA))
-		    == (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA)) {
-			mixed = 1;
-		}
 		if (flags & BTRFS_BLOCK_GROUP_DATA) {
 			r_data_used += sargs->spaces[i].used_bytes * ratio;
 			r_data_chunks += sargs->spaces[i].total_bytes * ratio;
@@ -414,6 +392,52 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
 		}
 	}
 
+	/*
+	 * Here we compute the space occupied by every RAID5/RAID6 chunks.
+	 * The computation is performed on the basis of the number of stripes
+	 * which compose each chunk, which could be different from the number of devices
+	 * if a disk is added later, or if the disks are of different sizes..
+	 */
+	struct chunk_info *info_ptr;
+	for (info_ptr = chunkinfo, i = chunkcount; i > 0; i--, info_ptr++) {
+		int flags = info_ptr->type;
+		int nparity;
+
+		if (flags & BTRFS_BLOCK_GROUP_RAID5) {
+			nparity = 1;
+		}
+		else if (flags & BTRFS_BLOCK_GROUP_RAID6) {
+			nparity = 2;
+		}
+		else {
+			// We're only interested in raid56 here
+			continue;
+		}
+
+		if (flags & BTRFS_BLOCK_GROUP_DATA) {
+			r_data_raid56_chunks += info_ptr->size / (info_ptr->num_stripes - nparity);
+			l_data_raid56_chunks += info_ptr->size / info_ptr->num_stripes;
+		}
+		if (flags & BTRFS_BLOCK_GROUP_METADATA) {
+			r_metadata_raid56_chunks += info_ptr->size / (info_ptr->num_stripes - nparity);
+			l_metadata_raid56_chunks += info_ptr->size / info_ptr->num_stripes;
+		}
+	}
+
+	if (l_data_raid56_chunks > 0) {
+		double ratio = (double)r_data_raid56_chunks / l_data_raid56_chunks;
+		r_data_used += r_data_raid56_chunks;
+		r_data_chunks += r_data_raid56_chunks;
+		l_data_chunks += l_data_raid56_chunks;
+		if (ratio > max_data_ratio)
+			max_data_ratio = ratio;
+	}
+	if (l_metadata_raid56_chunks > 0) {
+		r_metadata_used += r_metadata_raid56_chunks;
+		r_metadata_chunks += r_metadata_raid56_chunks;
+		l_metadata_chunks += l_metadata_raid56_chunks;
+	}
+
 	r_total_chunks = r_data_chunks + r_system_chunks;
 	r_total_used = r_data_used + r_system_used;
 	if (!mixed) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] btrfs-progs: fi usage: implement raid56
  2019-03-17 12:51 ` [PATCH 1/1] " stephane_btrfs
@ 2019-03-17 18:41   ` Goffredo Baroncelli
  2019-03-18 19:12     ` Stéphane Lesimple
  2019-03-17 23:23   ` Hans van Kranenburg
  1 sibling, 1 reply; 8+ messages in thread
From: Goffredo Baroncelli @ 2019-03-17 18:41 UTC (permalink / raw)
  To: stephane_btrfs, linux-btrfs

On 17/03/2019 13.51, stephane_btrfs@lesimple.fr wrote:
[...]
> -		if (!ratio)
> -			warning("RAID56 detected, not implemented");
> -

IIRC one of the problem which lead to this code was that the not root user cannot
access to chunkinfo. So the (regular) user should be warned about that: the printed 
info may be not correct.

[...]

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] btrfs-progs: fi usage: implement raid56
  2019-03-17 12:51 ` [PATCH 1/1] " stephane_btrfs
  2019-03-17 18:41   ` Goffredo Baroncelli
@ 2019-03-17 23:23   ` Hans van Kranenburg
  2019-03-18 19:41     ` Stéphane Lesimple
  1 sibling, 1 reply; 8+ messages in thread
From: Hans van Kranenburg @ 2019-03-17 23:23 UTC (permalink / raw)
  To: stephane_btrfs, linux-btrfs

Hi,

On 3/17/19 1:51 PM, stephane_btrfs@lesimple.fr wrote:
> From: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
> 
> Implement RAID5 and RAID6 support in `filesystem usage'.
> 
> Before:
>> WARNING: RAID56 detected, not implemented
>> WARNING: RAID56 detected, not implemented
>> WARNING: RAID56 detected, not implemented
>>      Device allocated:     0.00B
>>      Device unallocated:   2.04GiB
>>      Used:                 0.00B
>>      Free (estimated):     0.00B      (min: 8.00EiB)
>>      Data ratio:           0.00
>>      Metadata ratio:       0.00
> 
> After:
>>      Device allocated:     1.83GiB
>>      Device unallocated:   211.50MiB
>>      Used:                 1.83GiB
>>      Free (estimated):     128.44MiB  (min: 128.44MiB)
>>      Data ratio:           1.65
>>      Metadata ratio:       1.20
> 
> Note that this might need some fine-tuning for mixed mode.
> 
> We may also overestimate the free space incorrectly in
> some complex disk configurations, i.e. when some of the
> unallocated space is in fact unallocatable because the
> redundancy profile requisites can't be matched with how
> the remaining unallocated space is distributed over the
> devices. This is not specific to raid56 and also happens
> with other raid profiles, it'll need to be taken care of
> separately.
> 
> Signed-off-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
> ---
>  cmds-fi-usage.c | 96 ++++++++++++++++++++++++++++++-------------------
>  1 file changed, 60 insertions(+), 36 deletions(-)
> 
> diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
> index 9a23e176..baa1ff89 100644
> --- a/cmds-fi-usage.c
> +++ b/cmds-fi-usage.c
> @@ -280,28 +280,6 @@ static struct btrfs_ioctl_space_args *load_space_info(int fd, const char *path)
>  	return sargs;
>  }
>  
> -/*
> - * This function computes the space occupied by a *single* RAID5/RAID6 chunk.
> - * The computation is performed on the basis of the number of stripes
> - * which compose the chunk, which could be different from the number of devices
> - * if a disk is added later.
> - */
> -static void get_raid56_used(struct chunk_info *chunks, int chunkcount,
> -		u64 *raid5_used, u64 *raid6_used)
> -{
> -	struct chunk_info *info_ptr = chunks;
> -	*raid5_used = 0;
> -	*raid6_used = 0;
> -
> -	while (chunkcount-- > 0) {
> -		if (info_ptr->type & BTRFS_BLOCK_GROUP_RAID5)
> -			(*raid5_used) += info_ptr->size / (info_ptr->num_stripes - 1);
> -		if (info_ptr->type & BTRFS_BLOCK_GROUP_RAID6)
> -			(*raid6_used) += info_ptr->size / (info_ptr->num_stripes - 2);
> -		info_ptr++;
> -	}
> -}
> -
>  #define	MIN_UNALOCATED_THRESH	SZ_16M
>  static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
>  		int chunkcount, struct device_info *devinfo, int devcount,
> @@ -331,13 +309,15 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
>  	double data_ratio;
>  	double metadata_ratio;
>  	/* logical */
> -	u64 raid5_used = 0;
> -	u64 raid6_used = 0;
> +	u64 r_data_raid56_chunks = 0;
> +	u64 l_data_raid56_chunks = 0;
> +	u64 r_metadata_raid56_chunks = 0;
> +	u64 l_metadata_raid56_chunks = 0;

Can you explain what r and l mean? I have been staring at the code and
reading back and forth for some time, but I still really have no idea.

>  	u64 l_global_reserve = 0;
>  	u64 l_global_reserve_used = 0;
>  	u64 free_estimated = 0;
>  	u64 free_min = 0;
> -	int max_data_ratio = 1;
> +	double max_data_ratio = 1;
>  	int mixed = 0;
>  
>  	sargs = load_space_info(fd, path);
> @@ -359,24 +339,29 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
>  		ret = 1;
>  		goto exit;
>  	}
> -	get_raid56_used(chunkinfo, chunkcount, &raid5_used, &raid6_used);
>  
>  	for (i = 0; i < sargs->total_spaces; i++) {
>  		int ratio;
>  		u64 flags = sargs->spaces[i].flags;
>  
> +		if ((flags & (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA))
> +		    == (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA)) {
> +			mixed = 1;
> +		}
> +
>  		/*
>  		 * The raid5/raid6 ratio depends by the stripes number
> -		 * used by every chunk. It is computed separately
> +		 * used by every chunk. It is computed separately,
> +		 * after we're done with this loop, so skip those.
>  		 */
>  		if (flags & BTRFS_BLOCK_GROUP_RAID0)
>  			ratio = 1;
>  		else if (flags & BTRFS_BLOCK_GROUP_RAID1)
>  			ratio = 2;
>  		else if (flags & BTRFS_BLOCK_GROUP_RAID5)
> -			ratio = 0;
> +			continue;
>  		else if (flags & BTRFS_BLOCK_GROUP_RAID6)
> -			ratio = 0;
> +			continue;
>  		else if (flags & BTRFS_BLOCK_GROUP_DUP)
>  			ratio = 2;
>  		else if (flags & BTRFS_BLOCK_GROUP_RAID10)
> @@ -384,9 +369,6 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
>  		else
>  			ratio = 1;
>  
> -		if (!ratio)
> -			warning("RAID56 detected, not implemented");
> -
>  		if (ratio > max_data_ratio)
>  			max_data_ratio = ratio;
>  
> @@ -394,10 +376,6 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
>  			l_global_reserve = sargs->spaces[i].total_bytes;
>  			l_global_reserve_used = sargs->spaces[i].used_bytes;
>  		}
> -		if ((flags & (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA))
> -		    == (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA)) {
> -			mixed = 1;
> -		}
>  		if (flags & BTRFS_BLOCK_GROUP_DATA) {
>  			r_data_used += sargs->spaces[i].used_bytes * ratio;
>  			r_data_chunks += sargs->spaces[i].total_bytes * ratio;
> @@ -414,6 +392,52 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
>  		}
>  	}
>  
> +	/*
> +	 * Here we compute the space occupied by every RAID5/RAID6 chunks.
> +	 * The computation is performed on the basis of the number of stripes
> +	 * which compose each chunk, which could be different from the number of devices
> +	 * if a disk is added later, or if the disks are of different sizes..
> +	 */
> +	struct chunk_info *info_ptr;
> +	for (info_ptr = chunkinfo, i = chunkcount; i > 0; i--, info_ptr++) {
> +		int flags = info_ptr->type;
> +		int nparity;
> +
> +		if (flags & BTRFS_BLOCK_GROUP_RAID5) {
> +			nparity = 1;
> +		}
> +		else if (flags & BTRFS_BLOCK_GROUP_RAID6) {
> +			nparity = 2;
> +		}
> +		else {
> +			// We're only interested in raid56 here
> +			continue;
> +		}

I interpret the else continue as "ok, so apparently we can hit these
lines of code for other stuff than RAID56", which is expected and fine,
and then we'll silently ignore it.

But, in that case the value of nparity is not initialized and the lines
below seem like this part of the code is actually never executed for
anything that's not RAID56, so why the else up here? Or, maube print a
big error or crash the program, if the else never should happen?

> +
> +		if (flags & BTRFS_BLOCK_GROUP_DATA) {
> +			r_data_raid56_chunks += info_ptr->size / (info_ptr->num_stripes - nparity);

Ah, so this is the calculation of device extent size. Still wondering
what the r would mean.

> +			l_data_raid56_chunks += info_ptr->size / info_ptr->num_stripes;

I still don't understand what meaning this has. Chunk length divided by
num_stripes (with parity)...

A variable named data_raid56_chunks tells me that you are counting the
amount of chunks that have type DATA and profile RAID5 or RAID6, but
this code is doing something really different, it's counting bytes. But
what sort of bytes...

> +		}
> +		if (flags & BTRFS_BLOCK_GROUP_METADATA) {
> +			r_metadata_raid56_chunks += info_ptr->size / (info_ptr->num_stripes - nparity);
> +			l_metadata_raid56_chunks += info_ptr->size / info_ptr->num_stripes;
> +		}
> +	}
> +
> +	if (l_data_raid56_chunks > 0) {
> +		double ratio = (double)r_data_raid56_chunks / l_data_raid56_chunks;
> +		r_data_used += r_data_raid56_chunks;
> +		r_data_chunks += r_data_raid56_chunks;
> +		l_data_chunks += l_data_raid56_chunks;
> +		if (ratio > max_data_ratio)
> +			max_data_ratio = ratio;
> +	}
> +	if (l_metadata_raid56_chunks > 0) {
> +		r_metadata_used += r_metadata_raid56_chunks;
> +		r_metadata_chunks += r_metadata_raid56_chunks;
> +		l_metadata_chunks += l_metadata_raid56_chunks;
> +	}
> +
>  	r_total_chunks = r_data_chunks + r_system_chunks;
>  	r_total_used = r_data_used + r_system_used;
>  	if (!mixed) {
> 

I'm not a super-experienced C coder, but I can read C and write patches
while looking at the history of the code in some place, and following
patterns. What I'm throwing at you here is some feedback about
readability. I'd like to help reviewing this, but I can't really
understand it.

Hans

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] btrfs-progs: fi usage: implement raid56
  2019-03-17 18:41   ` Goffredo Baroncelli
@ 2019-03-18 19:12     ` Stéphane Lesimple
  2019-03-18 21:05       ` Goffredo Baroncelli
  0 siblings, 1 reply; 8+ messages in thread
From: Stéphane Lesimple @ 2019-03-18 19:12 UTC (permalink / raw)
  To: kreijack; +Cc: linux-btrfs

Le 2019-03-17 19:41, Goffredo Baroncelli a écrit :
> On 17/03/2019 13.51, stephane_btrfs@lesimple.fr wrote:
> [...]
>> -		if (!ratio)
>> -			warning("RAID56 detected, not implemented");
>> -
> 
> IIRC one of the problem which lead to this code was that the not root
> user cannot
> access to chunkinfo. So the (regular) user should be warned about
> that: the printed info may be not correct.


I think you may be referencing this other warning:

         ret = load_chunk_info(fd, chunkinfo, chunkcount);
         if (ret == -EPERM) {
                 warning("cannot read detailed chunk info, per-device 
usage will not be shown, run as root");

This one is still present, and matches exactly what you say:
it fires when btrfs fi usage is run under a non-root user.

The "RAID56 detected, not implemented" warning, on the other hand,
is displayed even under the root user, and indicates that it's normal
that most of the counters of "filesystem usage" are at zero, because
the code to compute these properly under raid5/raid6 isn't there yet,
and it's actually precisely this missing feature that my patch tries to 
add!

Thanks,

Stéphane.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] btrfs-progs: fi usage: implement raid56
  2019-03-17 23:23   ` Hans van Kranenburg
@ 2019-03-18 19:41     ` Stéphane Lesimple
  2019-03-19  1:14       ` Hans van Kranenburg
  0 siblings, 1 reply; 8+ messages in thread
From: Stéphane Lesimple @ 2019-03-18 19:41 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-btrfs

Hi Hans,

Le 2019-03-18 00:23, Hans van Kranenburg a écrit :
> Hi,
> 
> On 3/17/19 1:51 PM, stephane_btrfs@lesimple.fr wrote:
>> From: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
>> 
>> Implement RAID5 and RAID6 support in `filesystem usage'.
>> 
>> Before:
>>> WARNING: RAID56 detected, not implemented
>>> WARNING: RAID56 detected, not implemented
>>> WARNING: RAID56 detected, not implemented
>>>      Device allocated:     0.00B
>>>      Device unallocated:   2.04GiB
>>>      Used:                 0.00B
>>>      Free (estimated):     0.00B      (min: 8.00EiB)
>>>      Data ratio:           0.00
>>>      Metadata ratio:       0.00
>> 
>> After:
>>>      Device allocated:     1.83GiB
>>>      Device unallocated:   211.50MiB
>>>      Used:                 1.83GiB
>>>      Free (estimated):     128.44MiB  (min: 128.44MiB)
>>>      Data ratio:           1.65
>>>      Metadata ratio:       1.20
>> 
>> Note that this might need some fine-tuning for mixed mode.
>> 
>> We may also overestimate the free space incorrectly in
>> some complex disk configurations, i.e. when some of the
>> unallocated space is in fact unallocatable because the
>> redundancy profile requisites can't be matched with how
>> the remaining unallocated space is distributed over the
>> devices. This is not specific to raid56 and also happens
>> with other raid profiles, it'll need to be taken care of
>> separately.
>> 
>> Signed-off-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
>> ---
>>  cmds-fi-usage.c | 96 
>> ++++++++++++++++++++++++++++++-------------------
>>  1 file changed, 60 insertions(+), 36 deletions(-)
>> 
>> diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
>> index 9a23e176..baa1ff89 100644
>> --- a/cmds-fi-usage.c
>> +++ b/cmds-fi-usage.c
>> @@ -280,28 +280,6 @@ static struct btrfs_ioctl_space_args 
>> *load_space_info(int fd, const char *path)
>>  	return sargs;
>>  }
>> 
>> -/*
>> - * This function computes the space occupied by a *single* 
>> RAID5/RAID6 chunk.
>> - * The computation is performed on the basis of the number of stripes
>> - * which compose the chunk, which could be different from the number 
>> of devices
>> - * if a disk is added later.
>> - */
>> -static void get_raid56_used(struct chunk_info *chunks, int 
>> chunkcount,
>> -		u64 *raid5_used, u64 *raid6_used)
>> -{
>> -	struct chunk_info *info_ptr = chunks;
>> -	*raid5_used = 0;
>> -	*raid6_used = 0;
>> -
>> -	while (chunkcount-- > 0) {
>> -		if (info_ptr->type & BTRFS_BLOCK_GROUP_RAID5)
>> -			(*raid5_used) += info_ptr->size / (info_ptr->num_stripes - 1);
>> -		if (info_ptr->type & BTRFS_BLOCK_GROUP_RAID6)
>> -			(*raid6_used) += info_ptr->size / (info_ptr->num_stripes - 2);
>> -		info_ptr++;
>> -	}
>> -}
>> -
>>  #define	MIN_UNALOCATED_THRESH	SZ_16M
>>  static int print_filesystem_usage_overall(int fd, struct chunk_info 
>> *chunkinfo,
>>  		int chunkcount, struct device_info *devinfo, int devcount,
>> @@ -331,13 +309,15 @@ static int print_filesystem_usage_overall(int 
>> fd, struct chunk_info *chunkinfo,
>>  	double data_ratio;
>>  	double metadata_ratio;
>>  	/* logical */
>> -	u64 raid5_used = 0;
>> -	u64 raid6_used = 0;
>> +	u64 r_data_raid56_chunks = 0;
>> +	u64 l_data_raid56_chunks = 0;
>> +	u64 r_metadata_raid56_chunks = 0;
>> +	u64 l_metadata_raid56_chunks = 0;
> 
> Can you explain what r and l mean? I have been staring at the code and
> reading back and forth for some time, but I still really have no idea.

I've tried to stick with the naming convention of the other variables
used in this function. Some lines above this diff, you'll find the 
following
comment in the original file, followed by some vars declarations:

/*
  * r_* prefix is for raw data
  * l_* is for logical
  */
  [...]
  u64 r_data_chunks = 0;
  u64 l_data_chunks = 0;
  u64 r_metadata_used = 0;
  u64 r_metadata_chunks = 0;
  u64 l_metadata_chunks = 0;

This is why I've also named my vars this way.
If you take for example a chunk with a raid1 replication policy, its
real (r_) used size will be 2G, and its logical (l_) size will be 1G.


>>  	u64 l_global_reserve = 0;
>>  	u64 l_global_reserve_used = 0;
>>  	u64 free_estimated = 0;
>>  	u64 free_min = 0;
>> -	int max_data_ratio = 1;
>> +	double max_data_ratio = 1;
>>  	int mixed = 0;
>> 
>>  	sargs = load_space_info(fd, path);
>> @@ -359,24 +339,29 @@ static int print_filesystem_usage_overall(int 
>> fd, struct chunk_info *chunkinfo,
>>  		ret = 1;
>>  		goto exit;
>>  	}
>> -	get_raid56_used(chunkinfo, chunkcount, &raid5_used, &raid6_used);
>> 
>>  	for (i = 0; i < sargs->total_spaces; i++) {
>>  		int ratio;
>>  		u64 flags = sargs->spaces[i].flags;
>> 
>> +		if ((flags & (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA))
>> +		    == (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA)) {
>> +			mixed = 1;
>> +		}
>> +
>>  		/*
>>  		 * The raid5/raid6 ratio depends by the stripes number
>> -		 * used by every chunk. It is computed separately
>> +		 * used by every chunk. It is computed separately,
>> +		 * after we're done with this loop, so skip those.
>>  		 */
>>  		if (flags & BTRFS_BLOCK_GROUP_RAID0)
>>  			ratio = 1;
>>  		else if (flags & BTRFS_BLOCK_GROUP_RAID1)
>>  			ratio = 2;
>>  		else if (flags & BTRFS_BLOCK_GROUP_RAID5)
>> -			ratio = 0;
>> +			continue;
>>  		else if (flags & BTRFS_BLOCK_GROUP_RAID6)
>> -			ratio = 0;
>> +			continue;
>>  		else if (flags & BTRFS_BLOCK_GROUP_DUP)
>>  			ratio = 2;
>>  		else if (flags & BTRFS_BLOCK_GROUP_RAID10)
>> @@ -384,9 +369,6 @@ static int print_filesystem_usage_overall(int fd, 
>> struct chunk_info *chunkinfo,
>>  		else
>>  			ratio = 1;
>> 
>> -		if (!ratio)
>> -			warning("RAID56 detected, not implemented");
>> -
>>  		if (ratio > max_data_ratio)
>>  			max_data_ratio = ratio;
>> 
>> @@ -394,10 +376,6 @@ static int print_filesystem_usage_overall(int fd, 
>> struct chunk_info *chunkinfo,
>>  			l_global_reserve = sargs->spaces[i].total_bytes;
>>  			l_global_reserve_used = sargs->spaces[i].used_bytes;
>>  		}
>> -		if ((flags & (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA))
>> -		    == (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA)) {
>> -			mixed = 1;
>> -		}
>>  		if (flags & BTRFS_BLOCK_GROUP_DATA) {
>>  			r_data_used += sargs->spaces[i].used_bytes * ratio;
>>  			r_data_chunks += sargs->spaces[i].total_bytes * ratio;
>> @@ -414,6 +392,52 @@ static int print_filesystem_usage_overall(int fd, 
>> struct chunk_info *chunkinfo,
>>  		}
>>  	}
>> 
>> +	/*
>> +	 * Here we compute the space occupied by every RAID5/RAID6 chunks.
>> +	 * The computation is performed on the basis of the number of 
>> stripes
>> +	 * which compose each chunk, which could be different from the 
>> number of devices
>> +	 * if a disk is added later, or if the disks are of different 
>> sizes..
>> +	 */
>> +	struct chunk_info *info_ptr;
>> +	for (info_ptr = chunkinfo, i = chunkcount; i > 0; i--, info_ptr++) {
>> +		int flags = info_ptr->type;
>> +		int nparity;
>> +
>> +		if (flags & BTRFS_BLOCK_GROUP_RAID5) {
>> +			nparity = 1;
>> +		}
>> +		else if (flags & BTRFS_BLOCK_GROUP_RAID6) {
>> +			nparity = 2;
>> +		}
>> +		else {
>> +			// We're only interested in raid56 here
>> +			continue;
>> +		}
> 
> I interpret the else continue as "ok, so apparently we can hit these
> lines of code for other stuff than RAID56", which is expected and fine,
> and then we'll silently ignore it.

Yes, this for() loop will be entered in any case, because the 
replication
policy (raid5 or raid1) is not a property of the filesystem as a whole,
it's a property of a chunk. So you can have in the same filesystem, some
chunks with raid1 replication policy, and some others in raid5.
This can happen if, for example, you start a balance with 
-dconvert=raid5
on a previously raid1-only filesystem, and stop it after a few chunks 
have
been converted. This is perfectly valid from a btrfs standpoint.

So, this for() loop's job is to go trough each chunkinfo and process 
only
the raid5/raid6 ones, because for the other replication profiles, this 
is
already handled by the code above this for(), asfor every profile
except raid5/raid6, one don't need to go through each chunk: for raid1 
we
know that the ratio is exactly 2, because that's the very definition of
btrfs-raid1. For raid5/6 on the other hand, it depends on the number of
slices the chunk has, and this can change over time (as disks are added,
or if disks are not of the same size).

> But, in that case the value of nparity is not initialized and the lines
> below seem like this part of the code is actually never executed for
> anything that's not RAID56, so why the else up here? Or, maube print a
> big error or crash the program, if the else never should happen?

Actually, the else is there to just go check the next chunk in case the 
one we're
working on is not raid5 or raid6, because this for() loop only counts 
the
number of bytes in raid5/raid6 chunks (to later be able to compute the 
ratio).
We just want to skip all the others (they've already been handled by 
preexisting
code).
Also, as nparity is only used if we're inspecting a raid5/6 chunk,
there shouldn't be a case where it is referenced and not initialized 
AFAICT.

> 
>> +
>> +		if (flags & BTRFS_BLOCK_GROUP_DATA) {
>> +			r_data_raid56_chunks += info_ptr->size / (info_ptr->num_stripes - 
>> nparity);
> 
> Ah, so this is the calculation of device extent size. Still wondering
> what the r would mean.

here, we're counting the number of bytes used by raid5 or raid6 chunks, 
physically
on the devices.

> 
>> +			l_data_raid56_chunks += info_ptr->size / info_ptr->num_stripes;
> 
> I still don't understand what meaning this has. Chunk length divided by
> num_stripes (with parity)...
> 
> A variable named data_raid56_chunks tells me that you are counting the
> amount of chunks that have type DATA and profile RAID5 or RAID6, but
> this code is doing something really different, it's counting bytes. But
> what sort of bytes...

It's a bit misleading indeed, actually a previous version of my patch 
had
this variable named "r_data_raid56_size", but I noticed that in 
preexisting
code, "r_data_chunks" is used to count the number of bytes (and not 
number of
chunks) used by non-raid5/6 chunks, as you can see:

r_data_chunks += sargs->spaces[i].total_bytes * ratio;

And a similar line for the "l_" variant:

l_data_chunks += sargs->spaces[i].total_bytes;

So I've renamed my variables to stick with this naming usage.

>> +		}
>> +		if (flags & BTRFS_BLOCK_GROUP_METADATA) {
>> +			r_metadata_raid56_chunks += info_ptr->size / 
>> (info_ptr->num_stripes - nparity);
>> +			l_metadata_raid56_chunks += info_ptr->size / 
>> info_ptr->num_stripes;
>> +		}
>> +	}
>> +
>> +	if (l_data_raid56_chunks > 0) {
>> +		double ratio = (double)r_data_raid56_chunks / l_data_raid56_chunks;
>> +		r_data_used += r_data_raid56_chunks;
>> +		r_data_chunks += r_data_raid56_chunks;
>> +		l_data_chunks += l_data_raid56_chunks;
>> +		if (ratio > max_data_ratio)
>> +			max_data_ratio = ratio;
>> +	}
>> +	if (l_metadata_raid56_chunks > 0) {
>> +		r_metadata_used += r_metadata_raid56_chunks;
>> +		r_metadata_chunks += r_metadata_raid56_chunks;
>> +		l_metadata_chunks += l_metadata_raid56_chunks;
>> +	}
>> +
>>  	r_total_chunks = r_data_chunks + r_system_chunks;
>>  	r_total_used = r_data_used + r_system_used;
>>  	if (!mixed) {
>> 
> 
> I'm not a super-experienced C coder, but I can read C and write patches
> while looking at the history of the code in some place, and following
> patterns. What I'm throwing at you here is some feedback about
> readability. I'd like to help reviewing this, but I can't really
> understand it.

Thanks for your time reviewing this patch, and in any case I hope I've 
shed
some light on what it does exactly! Please don't hesitate to comment 
back
if anything is still not clear!

-- 
Stéphane.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] btrfs-progs: fi usage: implement raid56
  2019-03-18 19:12     ` Stéphane Lesimple
@ 2019-03-18 21:05       ` Goffredo Baroncelli
  0 siblings, 0 replies; 8+ messages in thread
From: Goffredo Baroncelli @ 2019-03-18 21:05 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: linux-btrfs

On 18/03/2019 20.12, Stéphane Lesimple wrote:
> Le 2019-03-17 19:41, Goffredo Baroncelli a écrit :
>> On 17/03/2019 13.51, stephane_btrfs@lesimple.fr wrote:
>> [...]
>>> -        if (!ratio)
>>> -            warning("RAID56 detected, not implemented");
>>> -
>>
>> IIRC one of the problem which lead to this code was that the not root
>> user cannot
>> access to chunkinfo. So the (regular) user should be warned about
>> that: the printed info may be not correct.
> 
> 
> I think you may be referencing this other warning:
> 
>         ret = load_chunk_info(fd, chunkinfo, chunkcount);
>         if (ret == -EPERM) {
>                 warning("cannot read detailed chunk info, per-device usage will not be shown, run as root");
> 
> This one is still present, and matches exactly what you say:
> it fires when btrfs fi usage is run under a non-root user.

Perfect ! I missed this part

BR
G.Baroncelli

> 
> The "RAID56 detected, not implemented" warning, on the other hand,
> is displayed even under the root user, and indicates that it's normal
> that most of the counters of "filesystem usage" are at zero, because
> the code to compute these properly under raid5/raid6 isn't there yet,
> and it's actually precisely this missing feature that my patch tries to add!
> 
> Thanks,
> 
> Stéphane.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] btrfs-progs: fi usage: implement raid56
  2019-03-18 19:41     ` Stéphane Lesimple
@ 2019-03-19  1:14       ` Hans van Kranenburg
  0 siblings, 0 replies; 8+ messages in thread
From: Hans van Kranenburg @ 2019-03-19  1:14 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: linux-btrfs

Hi,

Thanks for your explanations. Also, I want to explicitly say it's great
you're working on this. If I sounded like complaining, it seems it was
about the already existing code (the variable names).

So...

On 3/18/19 8:41 PM, Stéphane Lesimple wrote:
> Hi Hans,
> 
> Le 2019-03-18 00:23, Hans van Kranenburg a écrit :
>> Hi,
>>
>> On 3/17/19 1:51 PM, stephane_btrfs@lesimple.fr wrote:
>>> From: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
>>>
>>> Implement RAID5 and RAID6 support in `filesystem usage'.

In the commit message, a high level "bullet point" overview of what
changed would be useful. Like, what were the flaws, and what algorithm
choice does deal with it better?

I'm currently trying to understand the original code, and then the new
code, and then trying to reverse engineer what they do different.

Something like:
* The old code presented X by looking at P and then multiplying it with
S. This is obviously wrong, because then we ignore T. Instead, divide it
by R, so that we never end up with W.
* By adding U, we are suddenly able to also calculate G, which is very
useful but was missing before. Yay!

>>>
>>> Before:
>>>> WARNING: RAID56 detected, not implemented
>>>> WARNING: RAID56 detected, not implemented
>>>> WARNING: RAID56 detected, not implemented
>>>>      Device allocated:     0.00B
>>>>      Device unallocated:   2.04GiB
>>>>      Used:                 0.00B
>>>>      Free (estimated):     0.00B      (min: 8.00EiB)
>>>>      Data ratio:           0.00
>>>>      Metadata ratio:       0.00
>>>
>>> After:
>>>>      Device allocated:     1.83GiB
>>>>      Device unallocated:   211.50MiB
>>>>      Used:                 1.83GiB
>>>>      Free (estimated):     128.44MiB  (min: 128.44MiB)
>>>>      Data ratio:           1.65
>>>>      Metadata ratio:       1.20
>>>
>>> Note that this might need some fine-tuning for mixed mode.
>>>
>>> We may also overestimate the free space incorrectly in
>>> some complex disk configurations, i.e. when some of the
>>> unallocated space is in fact unallocatable because the
>>> redundancy profile requisites can't be matched with how
>>> the remaining unallocated space is distributed over the
>>> devices. This is not specific to raid56 and also happens
>>> with other raid profiles, it'll need to be taken care of
>>> separately.
>>>
>>> Signed-off-by: Stéphane Lesimple <stephane_btrfs@lesimple.fr>
>>> ---
>>>  cmds-fi-usage.c | 96 ++++++++++++++++++++++++++++++-------------------
>>>  1 file changed, 60 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
>>> index 9a23e176..baa1ff89 100644
>>> --- a/cmds-fi-usage.c
>>> +++ b/cmds-fi-usage.c
>>> @@ -280,28 +280,6 @@ static struct btrfs_ioctl_space_args
>>> *load_space_info(int fd, const char *path)
>>>      return sargs;
>>>  }
>>>
>>> -/*
>>> - * This function computes the space occupied by a *single*
>>> RAID5/RAID6 chunk.
>>> - * The computation is performed on the basis of the number of stripes
>>> - * which compose the chunk, which could be different from the number
>>> of devices
>>> - * if a disk is added later.
>>> - */
>>> -static void get_raid56_used(struct chunk_info *chunks, int chunkcount,
>>> -        u64 *raid5_used, u64 *raid6_used)
>>> -{
>>> -    struct chunk_info *info_ptr = chunks;
>>> -    *raid5_used = 0;
>>> -    *raid6_used = 0;
>>> -
>>> -    while (chunkcount-- > 0) {
>>> -        if (info_ptr->type & BTRFS_BLOCK_GROUP_RAID5)
>>> -            (*raid5_used) += info_ptr->size / (info_ptr->num_stripes
>>> - 1);
>>> -        if (info_ptr->type & BTRFS_BLOCK_GROUP_RAID6)
>>> -            (*raid6_used) += info_ptr->size / (info_ptr->num_stripes
>>> - 2);
>>> -        info_ptr++;
>>> -    }
>>> -}
>>> -
>>>  #define    MIN_UNALOCATED_THRESH    SZ_16M
>>>  static int print_filesystem_usage_overall(int fd, struct chunk_info
>>> *chunkinfo,
>>>          int chunkcount, struct device_info *devinfo, int devcount,
>>> @@ -331,13 +309,15 @@ static int print_filesystem_usage_overall(int
>>> fd, struct chunk_info *chunkinfo,
>>>      double data_ratio;
>>>      double metadata_ratio;
>>>      /* logical */
>>> -    u64 raid5_used = 0;
>>> -    u64 raid6_used = 0;
>>> +    u64 r_data_raid56_chunks = 0;
>>> +    u64 l_data_raid56_chunks = 0;
>>> +    u64 r_metadata_raid56_chunks = 0;
>>> +    u64 l_metadata_raid56_chunks = 0;
>>
>> Can you explain what r and l mean? I have been staring at the code and
>> reading back and forth for some time, but I still really have no idea.
> 
> I've tried to stick with the naming convention of the other variables
> used in this function. Some lines above this diff, you'll find the
> following
> comment in the original file, followed by some vars declarations:
> 
> /*
>  * r_* prefix is for raw data
>  * l_* is for logical
>  */
>  [...]
>  u64 r_data_used = 0;
>  u64 r_data_chunks = 0;
>  u64 l_data_chunks = 0;
>  u64 r_metadata_used = 0;
>  u64 r_metadata_chunks = 0;
>  u64 l_metadata_chunks = 0;
> 
> This is why I've also named my vars this way.
> If you take for example a chunk with a raid1 replication policy, its
> real (r_) used size will be 2G, and its logical (l_) size will be 1G.

Aha. So, for me using names like these make the code pretty hard to
follow, because, yes, data_chunks reads as "this counts the amount of
chunks", and then r_chunks and l_chunks seem like the amount of chunks
on the right disk and on the left disk. But, I should have read the
context first, before complaining, yes.

>>>      u64 l_global_reserve = 0;
>>>      u64 l_global_reserve_used = 0;
>>>      u64 free_estimated = 0;
>>>      u64 free_min = 0;
>>> -    int max_data_ratio = 1;
>>> +    double max_data_ratio = 1;
>>>      int mixed = 0;
>>>
>>>      sargs = load_space_info(fd, path);
>>> @@ -359,24 +339,29 @@ static int print_filesystem_usage_overall(int
>>> fd, struct chunk_info *chunkinfo,
>>>          ret = 1;
>>>          goto exit;
>>>      }
>>> -    get_raid56_used(chunkinfo, chunkcount, &raid5_used, &raid6_used);
>>>
>>>      for (i = 0; i < sargs->total_spaces; i++) {
>>>          int ratio;
>>>          u64 flags = sargs->spaces[i].flags;
>>>
>>> +        if ((flags & (BTRFS_BLOCK_GROUP_DATA |
>>> BTRFS_BLOCK_GROUP_METADATA))
>>> +            == (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA)) {
>>> +            mixed = 1;
>>> +        }
>>> +
>>>          /*
>>>           * The raid5/raid6 ratio depends by the stripes number
>>> -         * used by every chunk. It is computed separately
>>> +         * used by every chunk. It is computed separately,
>>> +         * after we're done with this loop, so skip those.
>>>           */
>>>          if (flags & BTRFS_BLOCK_GROUP_RAID0)
>>>              ratio = 1;
>>>          else if (flags & BTRFS_BLOCK_GROUP_RAID1)
>>>              ratio = 2;
>>>          else if (flags & BTRFS_BLOCK_GROUP_RAID5)
>>> -            ratio = 0;
>>> +            continue;
>>>          else if (flags & BTRFS_BLOCK_GROUP_RAID6)
>>> -            ratio = 0;
>>> +            continue;
>>>          else if (flags & BTRFS_BLOCK_GROUP_DUP)
>>>              ratio = 2;
>>>          else if (flags & BTRFS_BLOCK_GROUP_RAID10)
>>> @@ -384,9 +369,6 @@ static int print_filesystem_usage_overall(int fd,
>>> struct chunk_info *chunkinfo,
>>>          else
>>>              ratio = 1;
>>>
>>> -        if (!ratio)
>>> -            warning("RAID56 detected, not implemented");
>>> -
>>>          if (ratio > max_data_ratio)
>>>              max_data_ratio = ratio;
>>>
>>> @@ -394,10 +376,6 @@ static int print_filesystem_usage_overall(int
>>> fd, struct chunk_info *chunkinfo,
>>>              l_global_reserve = sargs->spaces[i].total_bytes;
>>>              l_global_reserve_used = sargs->spaces[i].used_bytes;
>>>          }
>>> -        if ((flags & (BTRFS_BLOCK_GROUP_DATA |
>>> BTRFS_BLOCK_GROUP_METADATA))
>>> -            == (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA)) {
>>> -            mixed = 1;
>>> -        }
>>>          if (flags & BTRFS_BLOCK_GROUP_DATA) {
>>>              r_data_used += sargs->spaces[i].used_bytes * ratio;
>>>              r_data_chunks += sargs->spaces[i].total_bytes * ratio;
>>> @@ -414,6 +392,52 @@ static int print_filesystem_usage_overall(int
>>> fd, struct chunk_info *chunkinfo,
>>>          }
>>>      }
>>>
>>> +    /*
>>> +     * Here we compute the space occupied by every RAID5/RAID6 chunks.
>>> +     * The computation is performed on the basis of the number of
>>> stripes
>>> +     * which compose each chunk, which could be different from the
>>> number of devices
>>> +     * if a disk is added later, or if the disks are of different
>>> sizes..
>>> +     */
>>> +    struct chunk_info *info_ptr;
>>> +    for (info_ptr = chunkinfo, i = chunkcount; i > 0; i--,
>>> info_ptr++) {
>>> +        int flags = info_ptr->type;
>>> +        int nparity;
>>> +
>>> +        if (flags & BTRFS_BLOCK_GROUP_RAID5) {
>>> +            nparity = 1;
>>> +        }
>>> +        else if (flags & BTRFS_BLOCK_GROUP_RAID6) {
>>> +            nparity = 2;
>>> +        }
>>> +        else {
>>> +            // We're only interested in raid56 here
>>> +            continue;
>>> +        }
>>
>> I interpret the else continue as "ok, so apparently we can hit these
>> lines of code for other stuff than RAID56", which is expected and fine,
>> and then we'll silently ignore it.
> 
> Yes, this for() loop will be entered in any case, because the replication
> policy (raid5 or raid1) is not a property of the filesystem as a whole,
> it's a property of a chunk. So you can have in the same filesystem, some
> chunks with raid1 replication policy, and some others in raid5.
> This can happen if, for example, you start a balance with -dconvert=raid5
> on a previously raid1-only filesystem, and stop it after a few chunks have
> been converted. This is perfectly valid from a btrfs standpoint.
> 
> So, this for() loop's job is to go trough each chunkinfo and process only
> the raid5/raid6 ones, because for the other replication profiles, this is
> already handled by the code above this for(), asfor every profile
> except raid5/raid6, one don't need to go through each chunk: for raid1 we
> know that the ratio is exactly 2, because that's the very definition of
> btrfs-raid1. For raid5/6 on the other hand, it depends on the number of
> slices the chunk has, and this can change over time (as disks are added,
> or if disks are not of the same size).
> 
>> But, in that case the value of nparity is not initialized and the lines
>> below seem like this part of the code is actually never executed for
>> anything that's not RAID56, so why the else up here? Or, maube print a
>> big error or crash the program, if the else never should happen?
> 
> Actually, the else is there to just go check the next chunk in case the
> one we're
> working on is not raid5 or raid6, because this for() loop only counts the
> number of bytes in raid5/raid6 chunks (to later be able to compute the
> ratio).
> We just want to skip all the others (they've already been handled by
> preexisting
> code).

Oh, right. I'm stupid. Continue skips the rest of the part of the for
loop, yes.

> Also, as nparity is only used if we're inspecting a raid5/6 chunk,
> there shouldn't be a case where it is referenced and not initialized
> AFAICT.
> 
>>
>>> +
>>> +        if (flags & BTRFS_BLOCK_GROUP_DATA) {
>>> +            r_data_raid56_chunks += info_ptr->size /
>>> (info_ptr->num_stripes - nparity);
>>
>> Ah, so this is the calculation of device extent size. Still wondering
>> what the r would mean.
> 
> here, we're counting the number of bytes used by raid5 or raid6 chunks,
> physically
> on the devices.

The calculation computes the device extent length (which is in btrfs
terminology the same as length of "a stripe"). So after you find out the
dev_extent length, it needs to be multiplied by the num_stripes of the
chunk to get the total amount of raw bytes that are allocated.

If you have 6 disks, and a 5GiB RAID5 chunk with 6x 1GiB dev_extent on 6
disks, then it looks like the above is just adding 1GiB instead of 6GiB?

And that way of calculating that is only valid for RAID56 instead,
because it ignores ncopies (about which more in a bit).

>>> +            l_data_raid56_chunks += info_ptr->size /
>>> info_ptr->num_stripes;
>>
>> I still don't understand what meaning this has. Chunk length divided by
>> num_stripes (with parity)...
>>
>> A variable named data_raid56_chunks tells me that you are counting the
>> amount of chunks that have type DATA and profile RAID5 or RAID6, but
>> this code is doing something really different, it's counting bytes. But
>> what sort of bytes...
> 
> It's a bit misleading indeed, actually a previous version of my patch had
> this variable named "r_data_raid56_size", but I noticed that in preexisting
> code, "r_data_chunks" is used to count the number of bytes (and not
> number of
> chunks) used by non-raid5/6 chunks, as you can see:

What I mean is, the number that you get when chunk size (length) is
divided by num_stripes doesn't really have a useful meaning because of
parity that's also there.

If l_data_raid56_chunks counts the logical/virtual bytes that are
available to be used, then that's just chunk size itself?

Or are these r/l_data_raid56_chunks counting things for only a single
disk? I don't get that idea from the code.

> r_data_chunks += sargs->spaces[i].total_bytes * ratio;

The ratio thing works if there's an easy translation between chunk size
and the total amount of raw allocated bytes for that chunk.

Ahh... I get a flash back now. Actually, I went through all of this a
few months ago when writing the usage report module in python-btrfs.

If you have nparity, then you can get rid of the whole ratio thing, and
make a more generic thing that works for everything.

The 'ratio' code is probably from the time before RAID56 was added, and
at that time everything was simple. When RAID56 was introduced it became
more complex and instead of fixing it properly the extra functions and
if statements were ducttaped on top.

> And a similar line for the "l_" variant:
> 
> l_data_chunks += sargs->spaces[i].total_bytes;
> 
> So I've renamed my variables to stick with this naming usage.
> 
>>> +        }
>>> +        if (flags & BTRFS_BLOCK_GROUP_METADATA) {
>>> +            r_metadata_raid56_chunks += info_ptr->size /
>>> (info_ptr->num_stripes - nparity);
>>> +            l_metadata_raid56_chunks += info_ptr->size /
>>> info_ptr->num_stripes;
>>> +        }
>>> +    }
>>> +
>>> +    if (l_data_raid56_chunks > 0) {
>>> +        double ratio = (double)r_data_raid56_chunks /
>>> l_data_raid56_chunks;
>>> +        r_data_used += r_data_raid56_chunks;
>>> +        r_data_chunks += r_data_raid56_chunks;
>>> +        l_data_chunks += l_data_raid56_chunks;
>>> +        if (ratio > max_data_ratio)
>>> +            max_data_ratio = ratio;
>>> +    }
>>> +    if (l_metadata_raid56_chunks > 0) {
>>> +        r_metadata_used += r_metadata_raid56_chunks;
>>> +        r_metadata_chunks += r_metadata_raid56_chunks;
>>> +        l_metadata_chunks += l_metadata_raid56_chunks;
>>> +    }
>>> +
>>>      r_total_chunks = r_data_chunks + r_system_chunks;
>>>      r_total_used = r_data_used + r_system_used;
>>>      if (!mixed) {
>>>
>>
>> I'm not a super-experienced C coder, but I can read C and write patches
>> while looking at the history of the code in some place, and following
>> patterns. What I'm throwing at you here is some feedback about
>> readability. I'd like to help reviewing this, but I can't really
>> understand it.
> 
> Thanks for your time reviewing this patch, and in any case I hope I've shed
> some light on what it does exactly! Please don't hesitate to comment back
> if anything is still not clear!

nparity is actually a thing. Having it makes it possible to add more
generic translation helper functions between chunk length and device
extent length (aka stripe size).

I wanted to add this in my code...

https://github.com/knorrie/python-btrfs/commit/4525d1e38c19ad8bce38bff7301934998764883c

...so I proposed to also add the same thing in the kernel code, which
immediately already caused getting rid of an ugly if block...

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b50836edf9fe531c66310071df59eac2d8dfc708

Now I see that progs has a copy of all of this, and I should have sent
patches for that also, and also for this one, which still has the old
incorrect values in progs:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da612e31aee51bd13231c78a47c714b543bd3ad8

If ncopies and nparity have the correct values, then you can make a few
helper functions which replace the whole ratio thing and the raid56
exceptions and make all of the code simpler, like, translating chunk
length to dev extent length and back, which works identical for all
profiles.

Feel free to steal all of the above things in the python code and port
it to C.

Also notice that btrfs naming for things is wildly different and
confusing for whoever knows common RAID terminology. This makes it even
harder for people with related knowledge in the industry to read btrfs
code, because it continuously seems to be doing other things than you
expect.

"normal" RAID:
 * stripe: a horizontal collection of data + parity chunks
 * chunk: the part on 1 disk of a complete stripe

btrfs:
 * chunk: some amount of virtual address space. When translated to the
total amount of physical bytes, this would be similar to a "normal" stripe.
 * stripe: the part on 1 disk of the total physical space occupied by a
btrfs chunk, so similar to a "normal" chunk.

So, it's like, the exact opposite thing, but using the same names.

Hans

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-03-19  1:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-17 12:51 [PATCH 0/1] btrfs-progs: fi usage: implement raid56 stephane_btrfs
2019-03-17 12:51 ` [PATCH 1/1] " stephane_btrfs
2019-03-17 18:41   ` Goffredo Baroncelli
2019-03-18 19:12     ` Stéphane Lesimple
2019-03-18 21:05       ` Goffredo Baroncelli
2019-03-17 23:23   ` Hans van Kranenburg
2019-03-18 19:41     ` Stéphane Lesimple
2019-03-19  1:14       ` Hans van Kranenburg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.