All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mdadm/Detail: show correct state for cluster-md array
@ 2020-07-22  7:11 Zhao Heming
  2020-07-22  7:20 ` heming.zhao
  0 siblings, 1 reply; 5+ messages in thread
From: Zhao Heming @ 2020-07-22  7:11 UTC (permalink / raw)
  To: linux-raid; +Cc: Zhao Heming, neilb, jes

After kernel md module commit 480523feae581, in md-cluster env,
mddev->in_sync always zero, it will make array.state never set
up MD_SB_CLEAN. it causes "mdadm -D /dev/mdX" show state 'active'
all the time.

bitmap.c: add a new API IsBitmapDirty() to support inquiry bitmap
dirty or clean.

Signed-off-by: Zhao Heming <heming.zhao@suse.com>
---
v2:
- Detail.c: change to read only one device.
- bitmap.c: modify IsBitmapDirty() to check all bitmap on the selected device.

---
 Detail.c | 20 +++++++++++++++++++-
 bitmap.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mdadm.h  |  1 +
 3 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/Detail.c b/Detail.c
index 24eeba0..cb5ce7d 100644
--- a/Detail.c
+++ b/Detail.c
@@ -495,8 +495,26 @@ int Detail(char *dev, struct context *c)
 							  sra->array_state);
 				else
 					arrayst = "clean";
-			} else
+			} else {
 				arrayst = "active";
+				if (array.state & (1<<MD_SB_CLUSTERED)) {
+					for (d = 0; d < max_disks * 2; d++) {
+						char *dv;
+						mdu_disk_info_t disk = disks[d];
+
+						if (d >= array.raid_disks * 2 &&
+								disk.major == 0 && disk.minor == 0)
+							continue;
+						if ((d & 1) && disk.major == 0 && disk.minor == 0)
+							continue;
+						dv = map_dev_preferred(disk.major, disk.minor, 0,
+												c->prefer);
+						if (dv && !IsBitmapDirty(dv))
+							arrayst = "clean";
+						break;
+					}
+				}
+			}
 
 			printf("             State : %s%s%s%s%s%s%s \n",
 			       arrayst, st,
diff --git a/bitmap.c b/bitmap.c
index e38cb96..1095dc8 100644
--- a/bitmap.c
+++ b/bitmap.c
@@ -368,6 +368,61 @@ free_info:
 	return rv;
 }
 
+int IsBitmapDirty(char *filename)
+{
+	/*
+	 * Read the bitmap file
+	 * This function is currently for cluster-md only.
+	 * Return: 1(dirty), 0 (clean), -1(error)
+	 */
+
+	int fd = -1, rv = 0, i;
+	struct supertype *st = NULL;
+	bitmap_info_t *info = NULL;
+	bitmap_super_t *sb = NULL;
+
+	fd = bitmap_file_open(filename, &st, 0);
+	free(st);
+	if (fd < 0)
+		goto out;
+
+	info = bitmap_fd_read(fd, 0);
+	close(fd);
+	if (!info)
+		goto out;
+
+	sb = &info->sb;
+	for (i = 0; i < (int)sb->nodes; i++) {
+		st = NULL;
+		free(info);
+		info = NULL;
+
+		fd = bitmap_file_open(filename, &st, i);
+		free(st);
+		if (fd < 0)
+			goto out;
+
+		info = bitmap_fd_read(fd, 0);
+		close(fd);
+		if (!info)
+			goto out;
+
+		sb = &info->sb;
+		if (sb->magic != BITMAP_MAGIC) { /* invalid bitmap magic */
+			free(info);
+			goto out;
+		}
+
+		if (info->dirty_bits)
+			rv = 1;
+	}
+
+	free(info);
+	return rv;
+out:
+	return -1;
+}
+
 int CreateBitmap(char *filename, int force, char uuid[16],
 		 unsigned long chunksize, unsigned long daemon_sleep,
 		 unsigned long write_behind,
diff --git a/mdadm.h b/mdadm.h
index 399478b..ba8ba91 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1447,6 +1447,7 @@ extern int CreateBitmap(char *filename, int force, char uuid[16],
 			unsigned long long array_size,
 			int major);
 extern int ExamineBitmap(char *filename, int brief, struct supertype *st);
+extern int IsBitmapDirty(char *filename);
 extern int Write_rules(char *rule_name);
 extern int bitmap_update_uuid(int fd, int *uuid, int swap);
 
-- 
2.25.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mdadm/Detail: show correct state for cluster-md array
  2020-07-22  7:11 [PATCH v2] mdadm/Detail: show correct state for cluster-md array Zhao Heming
@ 2020-07-22  7:20 ` heming.zhao
  2020-07-26  8:14   ` Wols Lists
  0 siblings, 1 reply; 5+ messages in thread
From: heming.zhao @ 2020-07-22  7:20 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb, jes

During I was creating patch, I found the ExamineBitmap() has memory leak issue.
I am not sure whether the leak issue should be fixed.
(Because when mdadm cmd finish, all leaked memory will be released).
The IsBitmapDirty() used some of ExamineBitmap() code, and I only fixed leaked issue in IsBitmapDirty().

Thanks,
heming

On 7/22/20 3:11 PM, Zhao Heming wrote:
> After kernel md module commit 480523feae581, in md-cluster env,
> mddev->in_sync always zero, it will make array.state never set
> up MD_SB_CLEAN. it causes "mdadm -D /dev/mdX" show state 'active'
> all the time.
> 
> bitmap.c: add a new API IsBitmapDirty() to support inquiry bitmap
> dirty or clean.
> 
> Signed-off-by: Zhao Heming <heming.zhao@suse.com>
> ---
> v2:
> - Detail.c: change to read only one device.
> - bitmap.c: modify IsBitmapDirty() to check all bitmap on the selected device.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mdadm/Detail: show correct state for cluster-md array
  2020-07-22  7:20 ` heming.zhao
@ 2020-07-26  8:14   ` Wols Lists
  2020-07-26  9:22     ` heming.zhao
  0 siblings, 1 reply; 5+ messages in thread
From: Wols Lists @ 2020-07-26  8:14 UTC (permalink / raw)
  To: heming.zhao, linux-raid; +Cc: neilb, jes

On 22/07/20 08:20, heming.zhao@suse.com wrote:
> During I was creating patch, I found the ExamineBitmap() has memory leak issue.
> I am not sure whether the leak issue should be fixed.
> (Because when mdadm cmd finish, all leaked memory will be released).
> The IsBitmapDirty() used some of ExamineBitmap() code, and I only fixed leaked issue in IsBitmapDirty().
> 
My gut feel?

Firstly, "do things right" - it should be fixed.
Second - are you sure this code is not run while mdadm is running as a
daemon? It's all very well saying it will be released, but but mdadm
could be running for a looonnngg time.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mdadm/Detail: show correct state for cluster-md array
  2020-07-26  8:14   ` Wols Lists
@ 2020-07-26  9:22     ` heming.zhao
  2020-07-26 14:21       ` Wols Lists
  0 siblings, 1 reply; 5+ messages in thread
From: heming.zhao @ 2020-07-26  9:22 UTC (permalink / raw)
  To: Wols Lists, linux-raid; +Cc: neilb, jes

Hello Wols,

I just started to learn mdadm code. Maybe there are some historical reasons to keep leaked issue. 
I guess your said daemon mode is: "mdadm --monitor --daemonise ...".
After very quickly browsing the code in Monitor.c, these mode check /proc/mdstat, send ioctl GET_ARRAY_INFO, and
read some /sys/block/mdX/md/xx files. There is no way to call ExamineBitmap().
In currently mdadm code, the only way to call ExamineBitmap() is by cmd "mdadm -X /dev/sdX". So as my last mail said, when the mdadm program finish, all leaked memory will be released.
And last week, before I send v2 patch, I try to use valgrind to check memory related issue, there are many places to leak. e.g. 
```
<1>
# valgrind --leak-check=full  ./mdadm -D /dev/md0
... ...
==3929== 
==3929== HEAP SUMMARY:
==3929==     in use at exit: 12,991 bytes in 190 blocks
==3929==   total heap usage: 354 allocs, 164 frees, 2,414,075 bytes allocated
==3929== 
==3929== 184 bytes in 1 blocks are definitely lost in loss record 15 of 24
==3929==    at 0x4C306B5: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3929==    by 0x47EB4C: xcalloc (xmalloc.c:62)
==3929==    by 0x4495E2: match_metadata_desc1 (super1.c:2316)
==3929==    by 0x4125CE: super_by_fd (util.c:1213)
==3929==    by 0x424E53: Detail (Detail.c:103)
==3929==    by 0x408AAA: misc_list (mdadm.c:1970)
==3929==    by 0x407CEF: main (mdadm.c:1640)
==3929== 
==3929== LEAK SUMMARY:
==3929==    definitely lost: 184 bytes in 1 blocks
==3929==    indirectly lost: 0 bytes in 0 blocks
==3929==      possibly lost: 0 bytes in 0 blocks
==3929==    still reachable: 12,807 bytes in 189 blocks
==3929==         suppressed: 0 bytes in 0 blocks
==3929== Reachable blocks (those to which a pointer was found) are not shown.
==3929== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==3929== 
==3929== For lists of detected and suppressed errors, rerun with: -s
==3929== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)


<2>
valgrind --leak-check=full  ./mdadm -X /dev/sda
 ... ...
==4077== 
==4077== HEAP SUMMARY:
==4077==     in use at exit: 8,944 bytes in 58 blocks
==4077==   total heap usage: 161 allocs, 103 frees, 458,399 bytes allocated
==4077== 
==4077== 184 bytes in 1 blocks are definitely lost in loss record 13 of 19
==4077==    at 0x4C306B5: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4077==    by 0x47EB4C: xcalloc (xmalloc.c:62)
==4077==    by 0x412885: guess_super_type (util.c:1290)
==4077==    by 0x47359F: guess_super (mdadm.h:1222)
==4077==    by 0x473C1C: bitmap_file_open (bitmap.c:205)
==4077==    by 0x473DB1: ExamineBitmap (bitmap.c:253)
==4077==    by 0x408B62: misc_list (mdadm.c:1988)
==4077==    by 0x407CEF: main (mdadm.c:1640)
==4077== 
==4077== 736 bytes in 4 blocks are definitely lost in loss record 15 of 19
==4077==    at 0x4C306B5: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4077==    by 0x47EB4C: xcalloc (xmalloc.c:62)
==4077==    by 0x412885: guess_super_type (util.c:1290)
==4077==    by 0x47359F: guess_super (mdadm.h:1222)
==4077==    by 0x473C1C: bitmap_file_open (bitmap.c:205)
==4077==    by 0x4742A5: ExamineBitmap (bitmap.c:337)
==4077==    by 0x408B62: misc_list (mdadm.c:1988)
==4077==    by 0x407CEF: main (mdadm.c:1640)
==4077== 
==4077== LEAK SUMMARY:
==4077==    definitely lost: 920 bytes in 5 blocks
==4077==    indirectly lost: 0 bytes in 0 blocks
==4077==      possibly lost: 0 bytes in 0 blocks
==4077==    still reachable: 8,024 bytes in 53 blocks
==4077==         suppressed: 0 bytes in 0 blocks
==4077== Reachable blocks (those to which a pointer was found) are not shown.
==4077== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==4077== 
==4077== For lists of detected and suppressed errors, rerun with: -s
==4077== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)


<3>
# valgrind --leak-check=full  ./mdadm -a /dev/md0 /dev/sdc
  ... ...
==4096== Warning: noted but unhandled ioctl 0x1269 with no size/direction hints.
==4096==    This could cause spurious value errors to appear.
==4096==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
mdadm: added /dev/sdc
==4096== Syscall param write(buf) points to uninitialised byte(s)
==4096==    at 0x512C244: write (in /lib64/libc-2.26.so)
==4096==    by 0x57FB706: ??? (in /usr/lib64/libdlm_lt.so.3.0)
==4096==    by 0x57FC0F1: dlm_ls_unlock (in /usr/lib64/libdlm_lt.so.3.0)
==4096==    by 0x40F84E: cluster_release_dlmlock (util.c:198)
==4096==    by 0x40837B: main (mdadm.c:1780)
==4096==  Address 0x1ffefffc0e is on thread 1's stack
==4096==  in frame #2, created by dlm_ls_unlock (???:)
==4096== 
==4096== Syscall param write(buf) points to uninitialised byte(s)
==4096==    at 0x512C244: write (in /lib64/libc-2.26.so)
==4096==    by 0x57FC4E0: dlm_release_lockspace (in /usr/lib64/libdlm_lt.so.3.0)
==4096==    by 0x40F906: cluster_release_dlmlock (util.c:218)
==4096==    by 0x40837B: main (mdadm.c:1780)
==4096==  Address 0x1ffeffeb5e is on thread 1's stack
==4096==  in frame #1, created by dlm_release_lockspace (???:)
==4096== 
==4096== 
==4096== HEAP SUMMARY:
==4096==     in use at exit: 13,737 bytes in 197 blocks
==4096==   total heap usage: 278 allocs, 81 frees, 3,253,146 bytes allocated
==4096== 
==4096== 184 bytes in 1 blocks are definitely lost in loss record 19 of 30
==4096==    at 0x4C306B5: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4096==    by 0x47EB4C: xcalloc (xmalloc.c:62)
==4096==    by 0x4495E2: match_metadata_desc1 (super1.c:2316)
==4096==    by 0x4125CE: super_by_fd (util.c:1213)
==4096==    by 0x419258: Manage_subdevs (Manage.c:1344)
==4096==    by 0x407398: main (mdadm.c:1477)
==4096== 
==4096== 184 bytes in 1 blocks are definitely lost in loss record 20 of 30
==4096==    at 0x4C306B5: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4096==    by 0x47EB4C: xcalloc (xmalloc.c:62)
==4096==    by 0x4127E7: dup_super (util.c:1268)
==4096==    by 0x417D73: Manage_add (Manage.c:813)
==4096==    by 0x419C3F: Manage_subdevs (Manage.c:1564)
==4096==    by 0x407398: main (mdadm.c:1477)
==4096== 
==4096== LEAK SUMMARY:
==4096==    definitely lost: 368 bytes in 2 blocks
==4096==    indirectly lost: 0 bytes in 0 blocks
==4096==      possibly lost: 0 bytes in 0 blocks
==4096==    still reachable: 13,369 bytes in 195 blocks
==4096==         suppressed: 0 bytes in 0 blocks
==4096== Reachable blocks (those to which a pointer was found) are not shown.
==4096== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==4096== 
==4096== Use --track-origins=yes to see where uninitialised values come from
==4096== For lists of detected and suppressed errors, rerun with: -s
==4096== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)
```

Thanks,
heming

On 7/26/20 4:14 PM, Wols Lists wrote:
> On 22/07/20 08:20, heming.zhao@suse.com wrote:
>> During I was creating patch, I found the ExamineBitmap() has memory leak issue.
>> I am not sure whether the leak issue should be fixed.
>> (Because when mdadm cmd finish, all leaked memory will be released).
>> The IsBitmapDirty() used some of ExamineBitmap() code, and I only fixed leaked issue in IsBitmapDirty().
>>
> My gut feel?
> 
> Firstly, "do things right" - it should be fixed.
> Second - are you sure this code is not run while mdadm is running as a
> daemon? It's all very well saying it will be released, but but mdadm
> could be running for a looonnngg time.
> 
> Cheers,
> Wol
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mdadm/Detail: show correct state for cluster-md array
  2020-07-26  9:22     ` heming.zhao
@ 2020-07-26 14:21       ` Wols Lists
  0 siblings, 0 replies; 5+ messages in thread
From: Wols Lists @ 2020-07-26 14:21 UTC (permalink / raw)
  To: heming.zhao, linux-raid; +Cc: neilb, jes

On 26/07/20 10:22, heming.zhao@suse.com wrote:
> Hello Wols,
> 
> I just started to learn mdadm code. Maybe there are some historical reasons to keep leaked issue. 
> I guess your said daemon mode is: "mdadm --monitor --daemonise ...".

> After very quickly browsing the code in Monitor.c, these mode check /proc/mdstat, send ioctl GET_ARRAY_INFO, and
> read some /sys/block/mdX/md/xx files. There is no way to call ExamineBitmap().

> In currently mdadm code, the only way to call ExamineBitmap() is by cmd "mdadm -X /dev/sdX". So as my last mail said, when the mdadm program finish, all leaked memory will be released.
> And last week, before I send v2 patch, I try to use valgrind to check memory related issue, there are many places to leak. e.g. 

You're learning the mdadm code? Personally, I find it hard to learn
stuff if there's no purpose behind what I'm studying. Treat it as a
learning exercise and fix all the leaks ;-)

As they say, if a job's worth doing it's worth doing well, and you can
learn what stuff is doing while you're working your way through it.

I need to learn my way around mdadm, and I've got a task in mind that'll
teach me a load of it, so all being well I'll soon be following in your
footsteps ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-26 14:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-22  7:11 [PATCH v2] mdadm/Detail: show correct state for cluster-md array Zhao Heming
2020-07-22  7:20 ` heming.zhao
2020-07-26  8:14   ` Wols Lists
2020-07-26  9:22     ` heming.zhao
2020-07-26 14:21       ` Wols Lists

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.