All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/3] make hvc pass dma capable memory to its backend
@ 2021-10-09 11:48 ` Xianting Tian
  0 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 11:48 UTC (permalink / raw)
  To: gregkh, jirislaby, amit, arnd, osandov
  Cc: shile.zhang, linuxppc-dev, virtualization, linux-kernel, Xianting Tian

Dear all,

This patch series make hvc framework pass DMA capable memory to
put_chars() of hvc backend(eg, virtio-console), and revert commit
c4baad5029 ("virtio-console: avoid DMA from stack”)

V1
virtio-console: avoid DMA from vmalloc area
https://lkml.org/lkml/2021/7/27/494

For v1 patch, Arnd Bergmann suggests to fix the issue in the first
place:
Make hvc pass DMA capable memory to put_chars()
The fix suggestion is included in v2.

V2
[PATCH 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/1/8
[PATCH 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/1/9

For v2 patch, Arnd Bergmann suggests to make new buf part of the
hvc_struct structure, and fix the compile issue.
The fix suggestion is included in v3.

V3
[PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/3/1347
[PATCH v3 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/3/1348

For v3 patch, Jiri Slaby suggests to make 'char c[N_OUTBUF]' part of
hvc_struct, and make 'hp->outbuf' aligned and use struct_size() to
calculate the size of hvc_struct. The fix suggestion is included in
v4.

V4
[PATCH v4 0/2] make hvc pass dma capable memory to its backend
https://lkml.org/lkml/2021/8/5/1350
[PATCH v4 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/5/1351
[PATCH v4 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/5/1352

For v4 patch, Arnd Bergmann suggests to introduce another
array(cons_outbuf[]) for the buffer pointers next to the cons_ops[]
and vtermnos[] arrays. This fix included in this v5 patch.

V5
Arnd Bergmann suggests to use "L1_CACHE_BYTES" as dma alignment,
use 'sizeof(long)' as dma alignment is wrong. fix it in v6.

V6
It contains coding error, fix it in v7 and it worked normally
according to test result.

V7
Greg KH suggests to add test and code review developer,
Jiri Slaby suggests to use lockless buffer and fix dma alignment
in separate patch.
fix above things in v8. 

V8
This contains coding error when switch to use new buffer. fix it in v9.

V9
It didn't make things much clearer, it needs add more comments for new added buf.
Add use lock to protect new added buffer. fix in v10.

********TEST STEPS*********
1, config guest console=hvc0
2, start guest
3, login guest
    Welcome to Buildroot
    buildroot login: root
    # 
    # cat /proc/cmdline 
    console=hvc0,115200 
    #

drivers/tty/hvc/hvc_console.c | 38 +++++++++++++++++++++--------------
drivers/tty/hvc/hvc_console.h | 24 ++++++++++++++++++++--
drivers/char/virtio_console.c | 12 ++----------
3 file changed

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v10 0/3] make hvc pass dma capable memory to its backend
@ 2021-10-09 11:48 ` Xianting Tian
  0 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 11:48 UTC (permalink / raw)
  To: gregkh, jirislaby, amit, arnd, osandov
  Cc: Xianting Tian, shile.zhang, linuxppc-dev, linux-kernel, virtualization

Dear all,

This patch series make hvc framework pass DMA capable memory to
put_chars() of hvc backend(eg, virtio-console), and revert commit
c4baad5029 ("virtio-console: avoid DMA from stack”)

V1
virtio-console: avoid DMA from vmalloc area
https://lkml.org/lkml/2021/7/27/494

For v1 patch, Arnd Bergmann suggests to fix the issue in the first
place:
Make hvc pass DMA capable memory to put_chars()
The fix suggestion is included in v2.

V2
[PATCH 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/1/8
[PATCH 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/1/9

For v2 patch, Arnd Bergmann suggests to make new buf part of the
hvc_struct structure, and fix the compile issue.
The fix suggestion is included in v3.

V3
[PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/3/1347
[PATCH v3 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/3/1348

For v3 patch, Jiri Slaby suggests to make 'char c[N_OUTBUF]' part of
hvc_struct, and make 'hp->outbuf' aligned and use struct_size() to
calculate the size of hvc_struct. The fix suggestion is included in
v4.

V4
[PATCH v4 0/2] make hvc pass dma capable memory to its backend
https://lkml.org/lkml/2021/8/5/1350
[PATCH v4 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/5/1351
[PATCH v4 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/5/1352

For v4 patch, Arnd Bergmann suggests to introduce another
array(cons_outbuf[]) for the buffer pointers next to the cons_ops[]
and vtermnos[] arrays. This fix included in this v5 patch.

V5
Arnd Bergmann suggests to use "L1_CACHE_BYTES" as dma alignment,
use 'sizeof(long)' as dma alignment is wrong. fix it in v6.

V6
It contains coding error, fix it in v7 and it worked normally
according to test result.

V7
Greg KH suggests to add test and code review developer,
Jiri Slaby suggests to use lockless buffer and fix dma alignment
in separate patch.
fix above things in v8. 

V8
This contains coding error when switch to use new buffer. fix it in v9.

V9
It didn't make things much clearer, it needs add more comments for new added buf.
Add use lock to protect new added buffer. fix in v10.

********TEST STEPS*********
1, config guest console=hvc0
2, start guest
3, login guest
    Welcome to Buildroot
    buildroot login: root
    # 
    # cat /proc/cmdline 
    console=hvc0,115200 
    #

drivers/tty/hvc/hvc_console.c | 38 +++++++++++++++++++++--------------
drivers/tty/hvc/hvc_console.h | 24 ++++++++++++++++++++--
drivers/char/virtio_console.c | 12 ++----------
3 file changed

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v10 1/3] tty: hvc: use correct dma alignment size
  2021-10-09 11:48 ` Xianting Tian
@ 2021-10-09 11:48   ` Xianting Tian
  -1 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 11:48 UTC (permalink / raw)
  To: gregkh, jirislaby, amit, arnd, osandov
  Cc: shile.zhang, linuxppc-dev, virtualization, linux-kernel, Xianting Tian

Use L1_CACHE_BYTES as the dma alignment size, use 'sizeof(long)'
is wrong.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 drivers/tty/hvc/hvc_console.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5bb8c4e44..5957ab728 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -49,7 +49,7 @@
 #define N_OUTBUF	16
 #define N_INBUF		16
 
-#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
+#define __ALIGNED__ __attribute__((__aligned__(L1_CACHE_BYTES)))
 
 static struct tty_driver *hvc_driver;
 static struct task_struct *hvc_task;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v10 1/3] tty: hvc: use correct dma alignment size
@ 2021-10-09 11:48   ` Xianting Tian
  0 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 11:48 UTC (permalink / raw)
  To: gregkh, jirislaby, amit, arnd, osandov
  Cc: Xianting Tian, shile.zhang, linuxppc-dev, linux-kernel, virtualization

Use L1_CACHE_BYTES as the dma alignment size, use 'sizeof(long)'
is wrong.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 drivers/tty/hvc/hvc_console.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5bb8c4e44..5957ab728 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -49,7 +49,7 @@
 #define N_OUTBUF	16
 #define N_INBUF		16
 
-#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
+#define __ALIGNED__ __attribute__((__aligned__(L1_CACHE_BYTES)))
 
 static struct tty_driver *hvc_driver;
 static struct task_struct *hvc_task;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
  2021-10-09 11:48 ` Xianting Tian
@ 2021-10-09 11:48   ` Xianting Tian
  -1 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 11:48 UTC (permalink / raw)
  To: gregkh, jirislaby, amit, arnd, osandov
  Cc: shile.zhang, linuxppc-dev, virtualization, linux-kernel, Xianting Tian

As well known, hvc backend can register its opertions to hvc backend.
the operations contain put_chars(), get_chars() and so on.

Some hvc backend may do dma in its operations. eg, put_chars() of
virtio-console. But in the code of hvc framework, it may pass DMA
incapable memory to put_chars() under a specific configuration, which
is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
1, c[] is on stack,
   hvc_console_print():
	char c[N_OUTBUF] __ALIGNED__;
	cons_ops[index]->put_chars(vtermnos[index], c, i);
2, ch is on stack,
   static void hvc_poll_put_char(,,char ch)
   {
	struct tty_struct *tty = driver->ttys[0];
	struct hvc_struct *hp = tty->driver_data;
	int n;

	do {
		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
	} while (n <= 0);
   }

Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
is passed to virtio-console by hvc framework in above code. But I think
the fix is aggressive, it directly uses kmemdup() to alloc new buffer
from kmalloc area and do memcpy no matter the memory is in kmalloc area
or not. But most importantly, it should better be fixed in the hvc
framework, by changing it to never pass stack memory to the put_chars()
function in the first place. Otherwise, we still face the same issue if
a new hvc backend using dma added in the furture.

In this patch, add 'char cons_outbuf[]' as part of 'struct hvc_struct',
so hp->cons_outbuf is no longer the stack memory, we can use it in above
case 1. Add 'char outchar' as part of 'struct hvc_struct', we can use it
in above case 2. We also add lock for each above buf to protect them
separately instead of using the global lock of hvc.

Introduce another array(cons_hvcs[]) for hvc pointers next to the
cons_ops[] and vtermnos[] arrays. With the array, we can easily find
hvc's cons_outbuf and its lock.

With the patch, we can revert the fix c4baad5029.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 drivers/tty/hvc/hvc_console.c | 37 +++++++++++++++++++++--------------
 drivers/tty/hvc/hvc_console.h | 24 +++++++++++++++++++++--
 2 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5bb8c4e44..4d8f112f2 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -41,16 +41,6 @@
  */
 #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */
 
-/*
- * These sizes are most efficient for vio, because they are the
- * native transfer size. We could make them selectable in the
- * future to better deal with backends that want other buffer sizes.
- */
-#define N_OUTBUF	16
-#define N_INBUF		16
-
-#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
-
 static struct tty_driver *hvc_driver;
 static struct task_struct *hvc_task;
 
@@ -142,6 +132,7 @@ static int hvc_flush(struct hvc_struct *hp)
 static const struct hv_ops *cons_ops[MAX_NR_HVC_CONSOLES];
 static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
 	{[0 ... MAX_NR_HVC_CONSOLES - 1] = -1};
+static struct hvc_struct *cons_hvcs[MAX_NR_HVC_CONSOLES];
 
 /*
  * Console APIs, NOT TTY.  These APIs are available immediately when
@@ -151,9 +142,11 @@ static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
 static void hvc_console_print(struct console *co, const char *b,
 			      unsigned count)
 {
-	char c[N_OUTBUF] __ALIGNED__;
+	char *c;
 	unsigned i = 0, n = 0;
 	int r, donecr = 0, index = co->index;
+	unsigned long flags;
+	struct hvc_struct *hp;
 
 	/* Console access attempt outside of acceptable console range. */
 	if (index >= MAX_NR_HVC_CONSOLES)
@@ -163,6 +156,13 @@ static void hvc_console_print(struct console *co, const char *b,
 	if (vtermnos[index] == -1)
 		return;
 
+	hp = cons_hvcs[index];
+	if (!hp)
+		return;
+
+	c = hp->cons_outbuf;
+
+	spin_lock_irqsave(&hp->cons_outbuf_lock, flags);
 	while (count > 0 || i > 0) {
 		if (count > 0 && i < sizeof(c)) {
 			if (b[n] == '\n' && !donecr) {
@@ -191,6 +191,7 @@ static void hvc_console_print(struct console *co, const char *b,
 			}
 		}
 	}
+	spin_unlock_irqrestore(&hp->cons_outbuf_lock, flags);
 	hvc_console_flush(cons_ops[index], vtermnos[index]);
 }
 
@@ -878,9 +879,13 @@ static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch)
 	struct tty_struct *tty = driver->ttys[0];
 	struct hvc_struct *hp = tty->driver_data;
 	int n;
+	unsigned long flags;
 
 	do {
-		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
+		spin_lock_irqsave(&hp->outchar_lock, flags);
+		hp->outchar = ch;
+		n = hp->ops->put_chars(hp->vtermno, hp->outchar, 1);
+		spin_unlock_irqrestore(&hp->outchar_lock, flags);
 	} while (n <= 0);
 }
 #endif
@@ -922,8 +927,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
 			return ERR_PTR(err);
 	}
 
-	hp = kzalloc(ALIGN(sizeof(*hp), sizeof(long)) + outbuf_size,
-			GFP_KERNEL);
+	hp = kzalloc(struct_size(hp, outbuf, outbuf_size), GFP_KERNEL);
 	if (!hp)
 		return ERR_PTR(-ENOMEM);
 
@@ -931,13 +935,14 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
 	hp->data = data;
 	hp->ops = ops;
 	hp->outbuf_size = outbuf_size;
-	hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];
 
 	tty_port_init(&hp->port);
 	hp->port.ops = &hvc_port_ops;
 
 	INIT_WORK(&hp->tty_resize, hvc_set_winsz);
 	spin_lock_init(&hp->lock);
+	spin_lock_init(&hp->outchar_lock);
+	spin_lock_init(&hp->cons_outbuf_lock);
 	mutex_lock(&hvc_structs_mutex);
 
 	/*
@@ -964,6 +969,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
 	if (i < MAX_NR_HVC_CONSOLES) {
 		cons_ops[i] = ops;
 		vtermnos[i] = vtermno;
+		cons_hvcs[i] = hp;
 	}
 
 	list_add_tail(&(hp->next), &hvc_structs);
@@ -988,6 +994,7 @@ int hvc_remove(struct hvc_struct *hp)
 	if (hp->index < MAX_NR_HVC_CONSOLES) {
 		vtermnos[hp->index] = -1;
 		cons_ops[hp->index] = NULL;
+		cons_hvcs[hp->index] = NULL;
 	}
 
 	/* Don't whack hp->irq because tty_hangup() will need to free the irq. */
diff --git a/drivers/tty/hvc/hvc_console.h b/drivers/tty/hvc/hvc_console.h
index 18d005814..98f0ced83 100644
--- a/drivers/tty/hvc/hvc_console.h
+++ b/drivers/tty/hvc/hvc_console.h
@@ -32,13 +32,21 @@
  */
 #define HVC_ALLOC_TTY_ADAPTERS	8
 
+/*
+ * These sizes are most efficient for vio, because they are the
+ * native transfer size. We could make them selectable in the
+ * future to better deal with backends that want other buffer sizes.
+ */
+#define N_OUTBUF	16
+#define N_INBUF		16
+
+#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
+
 struct hvc_struct {
 	struct tty_port port;
 	spinlock_t lock;
 	int index;
 	int do_wakeup;
-	char *outbuf;
-	int outbuf_size;
 	int n_outbuf;
 	uint32_t vtermno;
 	const struct hv_ops *ops;
@@ -48,6 +56,18 @@ struct hvc_struct {
 	struct work_struct tty_resize;
 	struct list_head next;
 	unsigned long flags;
+
+	/* the buf is used in hvc console api for putting chars */
+	char cons_outbuf[N_OUTBUF] __ALIGNED__;
+	spinlock_t cons_outbuf_lock;
+
+	/* the buf is for putting single char to tty */
+	char outchar;
+	spinlock_t outchar_lock;
+
+	/* the buf is for putting chars to tty */
+	int outbuf_size;
+	char outbuf[0] __ALIGNED__;
 };
 
 /* implemented by a low level driver */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-09 11:48   ` Xianting Tian
  0 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 11:48 UTC (permalink / raw)
  To: gregkh, jirislaby, amit, arnd, osandov
  Cc: Xianting Tian, shile.zhang, linuxppc-dev, linux-kernel, virtualization

As well known, hvc backend can register its opertions to hvc backend.
the operations contain put_chars(), get_chars() and so on.

Some hvc backend may do dma in its operations. eg, put_chars() of
virtio-console. But in the code of hvc framework, it may pass DMA
incapable memory to put_chars() under a specific configuration, which
is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
1, c[] is on stack,
   hvc_console_print():
	char c[N_OUTBUF] __ALIGNED__;
	cons_ops[index]->put_chars(vtermnos[index], c, i);
2, ch is on stack,
   static void hvc_poll_put_char(,,char ch)
   {
	struct tty_struct *tty = driver->ttys[0];
	struct hvc_struct *hp = tty->driver_data;
	int n;

	do {
		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
	} while (n <= 0);
   }

Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
is passed to virtio-console by hvc framework in above code. But I think
the fix is aggressive, it directly uses kmemdup() to alloc new buffer
from kmalloc area and do memcpy no matter the memory is in kmalloc area
or not. But most importantly, it should better be fixed in the hvc
framework, by changing it to never pass stack memory to the put_chars()
function in the first place. Otherwise, we still face the same issue if
a new hvc backend using dma added in the furture.

In this patch, add 'char cons_outbuf[]' as part of 'struct hvc_struct',
so hp->cons_outbuf is no longer the stack memory, we can use it in above
case 1. Add 'char outchar' as part of 'struct hvc_struct', we can use it
in above case 2. We also add lock for each above buf to protect them
separately instead of using the global lock of hvc.

Introduce another array(cons_hvcs[]) for hvc pointers next to the
cons_ops[] and vtermnos[] arrays. With the array, we can easily find
hvc's cons_outbuf and its lock.

With the patch, we can revert the fix c4baad5029.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 drivers/tty/hvc/hvc_console.c | 37 +++++++++++++++++++++--------------
 drivers/tty/hvc/hvc_console.h | 24 +++++++++++++++++++++--
 2 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5bb8c4e44..4d8f112f2 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -41,16 +41,6 @@
  */
 #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */
 
-/*
- * These sizes are most efficient for vio, because they are the
- * native transfer size. We could make them selectable in the
- * future to better deal with backends that want other buffer sizes.
- */
-#define N_OUTBUF	16
-#define N_INBUF		16
-
-#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
-
 static struct tty_driver *hvc_driver;
 static struct task_struct *hvc_task;
 
@@ -142,6 +132,7 @@ static int hvc_flush(struct hvc_struct *hp)
 static const struct hv_ops *cons_ops[MAX_NR_HVC_CONSOLES];
 static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
 	{[0 ... MAX_NR_HVC_CONSOLES - 1] = -1};
+static struct hvc_struct *cons_hvcs[MAX_NR_HVC_CONSOLES];
 
 /*
  * Console APIs, NOT TTY.  These APIs are available immediately when
@@ -151,9 +142,11 @@ static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
 static void hvc_console_print(struct console *co, const char *b,
 			      unsigned count)
 {
-	char c[N_OUTBUF] __ALIGNED__;
+	char *c;
 	unsigned i = 0, n = 0;
 	int r, donecr = 0, index = co->index;
+	unsigned long flags;
+	struct hvc_struct *hp;
 
 	/* Console access attempt outside of acceptable console range. */
 	if (index >= MAX_NR_HVC_CONSOLES)
@@ -163,6 +156,13 @@ static void hvc_console_print(struct console *co, const char *b,
 	if (vtermnos[index] == -1)
 		return;
 
+	hp = cons_hvcs[index];
+	if (!hp)
+		return;
+
+	c = hp->cons_outbuf;
+
+	spin_lock_irqsave(&hp->cons_outbuf_lock, flags);
 	while (count > 0 || i > 0) {
 		if (count > 0 && i < sizeof(c)) {
 			if (b[n] == '\n' && !donecr) {
@@ -191,6 +191,7 @@ static void hvc_console_print(struct console *co, const char *b,
 			}
 		}
 	}
+	spin_unlock_irqrestore(&hp->cons_outbuf_lock, flags);
 	hvc_console_flush(cons_ops[index], vtermnos[index]);
 }
 
@@ -878,9 +879,13 @@ static void hvc_poll_put_char(struct tty_driver *driver, int line, char ch)
 	struct tty_struct *tty = driver->ttys[0];
 	struct hvc_struct *hp = tty->driver_data;
 	int n;
+	unsigned long flags;
 
 	do {
-		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
+		spin_lock_irqsave(&hp->outchar_lock, flags);
+		hp->outchar = ch;
+		n = hp->ops->put_chars(hp->vtermno, hp->outchar, 1);
+		spin_unlock_irqrestore(&hp->outchar_lock, flags);
 	} while (n <= 0);
 }
 #endif
@@ -922,8 +927,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
 			return ERR_PTR(err);
 	}
 
-	hp = kzalloc(ALIGN(sizeof(*hp), sizeof(long)) + outbuf_size,
-			GFP_KERNEL);
+	hp = kzalloc(struct_size(hp, outbuf, outbuf_size), GFP_KERNEL);
 	if (!hp)
 		return ERR_PTR(-ENOMEM);
 
@@ -931,13 +935,14 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
 	hp->data = data;
 	hp->ops = ops;
 	hp->outbuf_size = outbuf_size;
-	hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];
 
 	tty_port_init(&hp->port);
 	hp->port.ops = &hvc_port_ops;
 
 	INIT_WORK(&hp->tty_resize, hvc_set_winsz);
 	spin_lock_init(&hp->lock);
+	spin_lock_init(&hp->outchar_lock);
+	spin_lock_init(&hp->cons_outbuf_lock);
 	mutex_lock(&hvc_structs_mutex);
 
 	/*
@@ -964,6 +969,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
 	if (i < MAX_NR_HVC_CONSOLES) {
 		cons_ops[i] = ops;
 		vtermnos[i] = vtermno;
+		cons_hvcs[i] = hp;
 	}
 
 	list_add_tail(&(hp->next), &hvc_structs);
@@ -988,6 +994,7 @@ int hvc_remove(struct hvc_struct *hp)
 	if (hp->index < MAX_NR_HVC_CONSOLES) {
 		vtermnos[hp->index] = -1;
 		cons_ops[hp->index] = NULL;
+		cons_hvcs[hp->index] = NULL;
 	}
 
 	/* Don't whack hp->irq because tty_hangup() will need to free the irq. */
diff --git a/drivers/tty/hvc/hvc_console.h b/drivers/tty/hvc/hvc_console.h
index 18d005814..98f0ced83 100644
--- a/drivers/tty/hvc/hvc_console.h
+++ b/drivers/tty/hvc/hvc_console.h
@@ -32,13 +32,21 @@
  */
 #define HVC_ALLOC_TTY_ADAPTERS	8
 
+/*
+ * These sizes are most efficient for vio, because they are the
+ * native transfer size. We could make them selectable in the
+ * future to better deal with backends that want other buffer sizes.
+ */
+#define N_OUTBUF	16
+#define N_INBUF		16
+
+#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
+
 struct hvc_struct {
 	struct tty_port port;
 	spinlock_t lock;
 	int index;
 	int do_wakeup;
-	char *outbuf;
-	int outbuf_size;
 	int n_outbuf;
 	uint32_t vtermno;
 	const struct hv_ops *ops;
@@ -48,6 +56,18 @@ struct hvc_struct {
 	struct work_struct tty_resize;
 	struct list_head next;
 	unsigned long flags;
+
+	/* the buf is used in hvc console api for putting chars */
+	char cons_outbuf[N_OUTBUF] __ALIGNED__;
+	spinlock_t cons_outbuf_lock;
+
+	/* the buf is for putting single char to tty */
+	char outchar;
+	spinlock_t outchar_lock;
+
+	/* the buf is for putting chars to tty */
+	int outbuf_size;
+	char outbuf[0] __ALIGNED__;
 };
 
 /* implemented by a low level driver */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v10 3/3] virtio-console: remove unnecessary kmemdup()
  2021-10-09 11:48 ` Xianting Tian
@ 2021-10-09 11:48   ` Xianting Tian
  -1 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 11:48 UTC (permalink / raw)
  To: gregkh, jirislaby, amit, arnd, osandov
  Cc: shile.zhang, linuxppc-dev, virtualization, linux-kernel, Xianting Tian

This revert commit c4baad5029 ("virtio-console: avoid DMA from stack")

hvc framework will never pass stack memory to the put_chars() function,
So the calling of kmemdup() is unnecessary, we can remove it.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 drivers/char/virtio_console.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7..4ed3ffb1d 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1117,8 +1117,6 @@ static int put_chars(u32 vtermno, const char *buf, int count)
 {
 	struct port *port;
 	struct scatterlist sg[1];
-	void *data;
-	int ret;
 
 	if (unlikely(early_put_chars))
 		return early_put_chars(vtermno, buf, count);
@@ -1127,14 +1125,8 @@ static int put_chars(u32 vtermno, const char *buf, int count)
 	if (!port)
 		return -EPIPE;
 
-	data = kmemdup(buf, count, GFP_ATOMIC);
-	if (!data)
-		return -ENOMEM;
-
-	sg_init_one(sg, data, count);
-	ret = __send_to_port(port, sg, 1, count, data, false);
-	kfree(data);
-	return ret;
+	sg_init_one(sg, buf, count);
+	return __send_to_port(port, sg, 1, count, (void *)buf, false);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v10 3/3] virtio-console: remove unnecessary kmemdup()
@ 2021-10-09 11:48   ` Xianting Tian
  0 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 11:48 UTC (permalink / raw)
  To: gregkh, jirislaby, amit, arnd, osandov
  Cc: Xianting Tian, shile.zhang, linuxppc-dev, linux-kernel, virtualization

This revert commit c4baad5029 ("virtio-console: avoid DMA from stack")

hvc framework will never pass stack memory to the put_chars() function,
So the calling of kmemdup() is unnecessary, we can remove it.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 drivers/char/virtio_console.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7..4ed3ffb1d 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1117,8 +1117,6 @@ static int put_chars(u32 vtermno, const char *buf, int count)
 {
 	struct port *port;
 	struct scatterlist sg[1];
-	void *data;
-	int ret;
 
 	if (unlikely(early_put_chars))
 		return early_put_chars(vtermno, buf, count);
@@ -1127,14 +1125,8 @@ static int put_chars(u32 vtermno, const char *buf, int count)
 	if (!port)
 		return -EPIPE;
 
-	data = kmemdup(buf, count, GFP_ATOMIC);
-	if (!data)
-		return -ENOMEM;
-
-	sg_init_one(sg, data, count);
-	ret = __send_to_port(port, sg, 1, count, data, false);
-	kfree(data);
-	return ret;
+	sg_init_one(sg, buf, count);
+	return __send_to_port(port, sg, 1, count, (void *)buf, false);
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
  2021-10-09 11:48   ` Xianting Tian
  (?)
@ 2021-10-09 11:55     ` Greg KH
  -1 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-09 11:55 UTC (permalink / raw)
  To: Xianting Tian
  Cc: jirislaby, amit, arnd, osandov, shile.zhang, linuxppc-dev,
	virtualization, linux-kernel

On Sat, Oct 09, 2021 at 07:48:28PM +0800, Xianting Tian wrote:
> As well known, hvc backend can register its opertions to hvc backend.
> the operations contain put_chars(), get_chars() and so on.
> 
> Some hvc backend may do dma in its operations. eg, put_chars() of
> virtio-console. But in the code of hvc framework, it may pass DMA
> incapable memory to put_chars() under a specific configuration, which
> is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
> 1, c[] is on stack,
>    hvc_console_print():
> 	char c[N_OUTBUF] __ALIGNED__;
> 	cons_ops[index]->put_chars(vtermnos[index], c, i);
> 2, ch is on stack,
>    static void hvc_poll_put_char(,,char ch)
>    {
> 	struct tty_struct *tty = driver->ttys[0];
> 	struct hvc_struct *hp = tty->driver_data;
> 	int n;
> 
> 	do {
> 		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
> 	} while (n <= 0);
>    }
> 
> Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
> is passed to virtio-console by hvc framework in above code. But I think
> the fix is aggressive, it directly uses kmemdup() to alloc new buffer
> from kmalloc area and do memcpy no matter the memory is in kmalloc area
> or not. But most importantly, it should better be fixed in the hvc
> framework, by changing it to never pass stack memory to the put_chars()
> function in the first place. Otherwise, we still face the same issue if
> a new hvc backend using dma added in the furture.
> 
> In this patch, add 'char cons_outbuf[]' as part of 'struct hvc_struct',
> so hp->cons_outbuf is no longer the stack memory, we can use it in above
> case 1. Add 'char outchar' as part of 'struct hvc_struct', we can use it
> in above case 2. We also add lock for each above buf to protect them
> separately instead of using the global lock of hvc.
> 
> Introduce another array(cons_hvcs[]) for hvc pointers next to the
> cons_ops[] and vtermnos[] arrays. With the array, we can easily find
> hvc's cons_outbuf and its lock.
> 
> With the patch, we can revert the fix c4baad5029.
> 
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
> ---
>  drivers/tty/hvc/hvc_console.c | 37 +++++++++++++++++++++--------------
>  drivers/tty/hvc/hvc_console.h | 24 +++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index 5bb8c4e44..4d8f112f2 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -41,16 +41,6 @@
>   */
>  #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */
>  
> -/*
> - * These sizes are most efficient for vio, because they are the
> - * native transfer size. We could make them selectable in the
> - * future to better deal with backends that want other buffer sizes.
> - */
> -#define N_OUTBUF	16
> -#define N_INBUF		16
> -
> -#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
> -

Are you sure this applies on top of patch 1/3 here?

> +/*
> + * These sizes are most efficient for vio, because they are the
> + * native transfer size. We could make them selectable in the
> + * future to better deal with backends that want other buffer sizes.
> + */
> +#define N_OUTBUF	16
> +#define N_INBUF		16
> +
> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))

Again, are you sure this is correct?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-09 11:55     ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-09 11:55 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov

On Sat, Oct 09, 2021 at 07:48:28PM +0800, Xianting Tian wrote:
> As well known, hvc backend can register its opertions to hvc backend.
> the operations contain put_chars(), get_chars() and so on.
> 
> Some hvc backend may do dma in its operations. eg, put_chars() of
> virtio-console. But in the code of hvc framework, it may pass DMA
> incapable memory to put_chars() under a specific configuration, which
> is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
> 1, c[] is on stack,
>    hvc_console_print():
> 	char c[N_OUTBUF] __ALIGNED__;
> 	cons_ops[index]->put_chars(vtermnos[index], c, i);
> 2, ch is on stack,
>    static void hvc_poll_put_char(,,char ch)
>    {
> 	struct tty_struct *tty = driver->ttys[0];
> 	struct hvc_struct *hp = tty->driver_data;
> 	int n;
> 
> 	do {
> 		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
> 	} while (n <= 0);
>    }
> 
> Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
> is passed to virtio-console by hvc framework in above code. But I think
> the fix is aggressive, it directly uses kmemdup() to alloc new buffer
> from kmalloc area and do memcpy no matter the memory is in kmalloc area
> or not. But most importantly, it should better be fixed in the hvc
> framework, by changing it to never pass stack memory to the put_chars()
> function in the first place. Otherwise, we still face the same issue if
> a new hvc backend using dma added in the furture.
> 
> In this patch, add 'char cons_outbuf[]' as part of 'struct hvc_struct',
> so hp->cons_outbuf is no longer the stack memory, we can use it in above
> case 1. Add 'char outchar' as part of 'struct hvc_struct', we can use it
> in above case 2. We also add lock for each above buf to protect them
> separately instead of using the global lock of hvc.
> 
> Introduce another array(cons_hvcs[]) for hvc pointers next to the
> cons_ops[] and vtermnos[] arrays. With the array, we can easily find
> hvc's cons_outbuf and its lock.
> 
> With the patch, we can revert the fix c4baad5029.
> 
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
> ---
>  drivers/tty/hvc/hvc_console.c | 37 +++++++++++++++++++++--------------
>  drivers/tty/hvc/hvc_console.h | 24 +++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index 5bb8c4e44..4d8f112f2 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -41,16 +41,6 @@
>   */
>  #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */
>  
> -/*
> - * These sizes are most efficient for vio, because they are the
> - * native transfer size. We could make them selectable in the
> - * future to better deal with backends that want other buffer sizes.
> - */
> -#define N_OUTBUF	16
> -#define N_INBUF		16
> -
> -#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
> -

Are you sure this applies on top of patch 1/3 here?

> +/*
> + * These sizes are most efficient for vio, because they are the
> + * native transfer size. We could make them selectable in the
> + * future to better deal with backends that want other buffer sizes.
> + */
> +#define N_OUTBUF	16
> +#define N_INBUF		16
> +
> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))

Again, are you sure this is correct?

thanks,

greg k-h
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-09 11:55     ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-09 11:55 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov

On Sat, Oct 09, 2021 at 07:48:28PM +0800, Xianting Tian wrote:
> As well known, hvc backend can register its opertions to hvc backend.
> the operations contain put_chars(), get_chars() and so on.
> 
> Some hvc backend may do dma in its operations. eg, put_chars() of
> virtio-console. But in the code of hvc framework, it may pass DMA
> incapable memory to put_chars() under a specific configuration, which
> is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
> 1, c[] is on stack,
>    hvc_console_print():
> 	char c[N_OUTBUF] __ALIGNED__;
> 	cons_ops[index]->put_chars(vtermnos[index], c, i);
> 2, ch is on stack,
>    static void hvc_poll_put_char(,,char ch)
>    {
> 	struct tty_struct *tty = driver->ttys[0];
> 	struct hvc_struct *hp = tty->driver_data;
> 	int n;
> 
> 	do {
> 		n = hp->ops->put_chars(hp->vtermno, &ch, 1);
> 	} while (n <= 0);
>    }
> 
> Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
> is passed to virtio-console by hvc framework in above code. But I think
> the fix is aggressive, it directly uses kmemdup() to alloc new buffer
> from kmalloc area and do memcpy no matter the memory is in kmalloc area
> or not. But most importantly, it should better be fixed in the hvc
> framework, by changing it to never pass stack memory to the put_chars()
> function in the first place. Otherwise, we still face the same issue if
> a new hvc backend using dma added in the furture.
> 
> In this patch, add 'char cons_outbuf[]' as part of 'struct hvc_struct',
> so hp->cons_outbuf is no longer the stack memory, we can use it in above
> case 1. Add 'char outchar' as part of 'struct hvc_struct', we can use it
> in above case 2. We also add lock for each above buf to protect them
> separately instead of using the global lock of hvc.
> 
> Introduce another array(cons_hvcs[]) for hvc pointers next to the
> cons_ops[] and vtermnos[] arrays. With the array, we can easily find
> hvc's cons_outbuf and its lock.
> 
> With the patch, we can revert the fix c4baad5029.
> 
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
> ---
>  drivers/tty/hvc/hvc_console.c | 37 +++++++++++++++++++++--------------
>  drivers/tty/hvc/hvc_console.h | 24 +++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index 5bb8c4e44..4d8f112f2 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -41,16 +41,6 @@
>   */
>  #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */
>  
> -/*
> - * These sizes are most efficient for vio, because they are the
> - * native transfer size. We could make them selectable in the
> - * future to better deal with backends that want other buffer sizes.
> - */
> -#define N_OUTBUF	16
> -#define N_INBUF		16
> -
> -#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))
> -

Are you sure this applies on top of patch 1/3 here?

> +/*
> + * These sizes are most efficient for vio, because they are the
> + * native transfer size. We could make them selectable in the
> + * future to better deal with backends that want other buffer sizes.
> + */
> +#define N_OUTBUF	16
> +#define N_INBUF		16
> +
> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))

Again, are you sure this is correct?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
  2021-10-09 11:48   ` Xianting Tian
  (?)
@ 2021-10-09 11:58     ` Greg KH
  -1 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-09 11:58 UTC (permalink / raw)
  To: Xianting Tian
  Cc: jirislaby, amit, arnd, osandov, shile.zhang, linuxppc-dev,
	virtualization, linux-kernel

On Sat, Oct 09, 2021 at 07:48:28PM +0800, Xianting Tian wrote:
> --- a/drivers/tty/hvc/hvc_console.h
> +++ b/drivers/tty/hvc/hvc_console.h
> @@ -32,13 +32,21 @@
>   */
>  #define HVC_ALLOC_TTY_ADAPTERS	8
>  
> +/*
> + * These sizes are most efficient for vio, because they are the
> + * native transfer size. We could make them selectable in the
> + * future to better deal with backends that want other buffer sizes.
> + */
> +#define N_OUTBUF	16
> +#define N_INBUF		16
> +
> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))

Does this conflict with what is in hvcs.c?

> +
>  struct hvc_struct {
>  	struct tty_port port;
>  	spinlock_t lock;
>  	int index;
>  	int do_wakeup;
> -	char *outbuf;
> -	int outbuf_size;
>  	int n_outbuf;
>  	uint32_t vtermno;
>  	const struct hv_ops *ops;
> @@ -48,6 +56,18 @@ struct hvc_struct {
>  	struct work_struct tty_resize;
>  	struct list_head next;
>  	unsigned long flags;
> +
> +	/* the buf is used in hvc console api for putting chars */
> +	char cons_outbuf[N_OUTBUF] __ALIGNED__;
> +	spinlock_t cons_outbuf_lock;

Did you look at the placement using pahole as to how this structure now
looks?

> +
> +	/* the buf is for putting single char to tty */
> +	char outchar;
> +	spinlock_t outchar_lock;

So you have a lock for a character and a different one for a longer
string?  Why can they not just use the same lock?  Why are 2 needed at
all, can't you just use the first character of cons_outbuf[] instead?
Surely you do not have 2 sends happening at the same time, right?

> +
> +	/* the buf is for putting chars to tty */
> +	int outbuf_size;
> +	char outbuf[0] __ALIGNED__;

I thought we were not allowing [0] anymore in kernel structures?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-09 11:58     ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-09 11:58 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov

On Sat, Oct 09, 2021 at 07:48:28PM +0800, Xianting Tian wrote:
> --- a/drivers/tty/hvc/hvc_console.h
> +++ b/drivers/tty/hvc/hvc_console.h
> @@ -32,13 +32,21 @@
>   */
>  #define HVC_ALLOC_TTY_ADAPTERS	8
>  
> +/*
> + * These sizes are most efficient for vio, because they are the
> + * native transfer size. We could make them selectable in the
> + * future to better deal with backends that want other buffer sizes.
> + */
> +#define N_OUTBUF	16
> +#define N_INBUF		16
> +
> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))

Does this conflict with what is in hvcs.c?

> +
>  struct hvc_struct {
>  	struct tty_port port;
>  	spinlock_t lock;
>  	int index;
>  	int do_wakeup;
> -	char *outbuf;
> -	int outbuf_size;
>  	int n_outbuf;
>  	uint32_t vtermno;
>  	const struct hv_ops *ops;
> @@ -48,6 +56,18 @@ struct hvc_struct {
>  	struct work_struct tty_resize;
>  	struct list_head next;
>  	unsigned long flags;
> +
> +	/* the buf is used in hvc console api for putting chars */
> +	char cons_outbuf[N_OUTBUF] __ALIGNED__;
> +	spinlock_t cons_outbuf_lock;

Did you look at the placement using pahole as to how this structure now
looks?

> +
> +	/* the buf is for putting single char to tty */
> +	char outchar;
> +	spinlock_t outchar_lock;

So you have a lock for a character and a different one for a longer
string?  Why can they not just use the same lock?  Why are 2 needed at
all, can't you just use the first character of cons_outbuf[] instead?
Surely you do not have 2 sends happening at the same time, right?

> +
> +	/* the buf is for putting chars to tty */
> +	int outbuf_size;
> +	char outbuf[0] __ALIGNED__;

I thought we were not allowing [0] anymore in kernel structures?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-09 11:58     ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-09 11:58 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov

On Sat, Oct 09, 2021 at 07:48:28PM +0800, Xianting Tian wrote:
> --- a/drivers/tty/hvc/hvc_console.h
> +++ b/drivers/tty/hvc/hvc_console.h
> @@ -32,13 +32,21 @@
>   */
>  #define HVC_ALLOC_TTY_ADAPTERS	8
>  
> +/*
> + * These sizes are most efficient for vio, because they are the
> + * native transfer size. We could make them selectable in the
> + * future to better deal with backends that want other buffer sizes.
> + */
> +#define N_OUTBUF	16
> +#define N_INBUF		16
> +
> +#define __ALIGNED__ __attribute__((__aligned__(sizeof(long))))

Does this conflict with what is in hvcs.c?

> +
>  struct hvc_struct {
>  	struct tty_port port;
>  	spinlock_t lock;
>  	int index;
>  	int do_wakeup;
> -	char *outbuf;
> -	int outbuf_size;
>  	int n_outbuf;
>  	uint32_t vtermno;
>  	const struct hv_ops *ops;
> @@ -48,6 +56,18 @@ struct hvc_struct {
>  	struct work_struct tty_resize;
>  	struct list_head next;
>  	unsigned long flags;
> +
> +	/* the buf is used in hvc console api for putting chars */
> +	char cons_outbuf[N_OUTBUF] __ALIGNED__;
> +	spinlock_t cons_outbuf_lock;

Did you look at the placement using pahole as to how this structure now
looks?

> +
> +	/* the buf is for putting single char to tty */
> +	char outchar;
> +	spinlock_t outchar_lock;

So you have a lock for a character and a different one for a longer
string?  Why can they not just use the same lock?  Why are 2 needed at
all, can't you just use the first character of cons_outbuf[] instead?
Surely you do not have 2 sends happening at the same time, right?

> +
> +	/* the buf is for putting chars to tty */
> +	int outbuf_size;
> +	char outbuf[0] __ALIGNED__;

I thought we were not allowing [0] anymore in kernel structures?

thanks,

greg k-h
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
  2021-10-09 11:58     ` Greg KH
@ 2021-10-09 15:45       ` Xianting Tian
  -1 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 15:45 UTC (permalink / raw)
  To: Greg KH
  Cc: jirislaby, amit, arnd, osandov, shile.zhang, linuxppc-dev,
	virtualization, linux-kernel


在 2021/10/9 下午7:58, Greg KH 写道:
> Did you look at the placement using pahole as to how this structure now
> looks?

thanks for all your commnts. for this one, do you mean I need to remove 
the blank line?  thanks


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-09 15:45       ` Xianting Tian
  0 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-09 15:45 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov


在 2021/10/9 下午7:58, Greg KH 写道:
> Did you look at the placement using pahole as to how this structure now
> looks?

thanks for all your commnts. for this one, do you mean I need to remove 
the blank line?  thanks


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
  2021-10-09 15:45       ` Xianting Tian
  (?)
@ 2021-10-10  5:33         ` Greg KH
  -1 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-10  5:33 UTC (permalink / raw)
  To: Xianting Tian
  Cc: jirislaby, amit, arnd, osandov, shile.zhang, linuxppc-dev,
	virtualization, linux-kernel

On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
> 
> 在 2021/10/9 下午7:58, Greg KH 写道:
> > Did you look at the placement using pahole as to how this structure now
> > looks?
> 
> thanks for all your commnts. for this one, do you mean I need to remove the
> blank line?  thanks
>

No, I mean to use the tool 'pahole' to see the structure layout that you
just created and determine if it really is the best way to add these new
fields, especially as you are adding huge buffers with odd alignment.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-10  5:33         ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-10  5:33 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov

On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
> 
> 在 2021/10/9 下午7:58, Greg KH 写道:
> > Did you look at the placement using pahole as to how this structure now
> > looks?
> 
> thanks for all your commnts. for this one, do you mean I need to remove the
> blank line?  thanks
>

No, I mean to use the tool 'pahole' to see the structure layout that you
just created and determine if it really is the best way to add these new
fields, especially as you are adding huge buffers with odd alignment.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-10  5:33         ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-10  5:33 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov

On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
> 
> 在 2021/10/9 下午7:58, Greg KH 写道:
> > Did you look at the placement using pahole as to how this structure now
> > looks?
> 
> thanks for all your commnts. for this one, do you mean I need to remove the
> blank line?  thanks
>

No, I mean to use the tool 'pahole' to see the structure layout that you
just created and determine if it really is the best way to add these new
fields, especially as you are adding huge buffers with odd alignment.

thanks,

greg k-h
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
  2021-10-10  5:33         ` Greg KH
@ 2021-10-14  8:34           ` Xianting Tian
  -1 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-14  8:34 UTC (permalink / raw)
  To: Greg KH
  Cc: jirislaby, amit, arnd, osandov, shile.zhang, linuxppc-dev,
	virtualization, linux-kernel


在 2021/10/10 下午1:33, Greg KH 写道:
> On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
>> 在 2021/10/9 下午7:58, Greg KH 写道:
>>> Did you look at the placement using pahole as to how this structure now
>>> looks?
>> thanks for all your commnts. for this one, do you mean I need to remove the
>> blank line?  thanks
>>
> No, I mean to use the tool 'pahole' to see the structure layout that you
> just created and determine if it really is the best way to add these new
> fields, especially as you are adding huge buffers with odd alignment.

thanks,

Based on your comments, I removed 'char outchar',  remian the position 
of 'int outbuf_size' unchanged to keep outbuf_size and lock in the same 
cache line.  Now hvc_struct change as below,

  struct hvc_struct {
         struct tty_port port;
         spinlock_t lock;
         int index;
         int do_wakeup;
-       char *outbuf;
         int outbuf_size;
         int n_outbuf;
         uint32_t vtermno;
@@ -48,6 +57,16 @@ struct hvc_struct {
         struct work_struct tty_resize;
         struct list_head next;
         unsigned long flags;
+
+       /*
+        * the buf is used in hvc console api for putting chars,
+        * and also used in hvc_poll_put_char() for putting single char.
+        */
+       char cons_outbuf[N_OUTBUF] __ALIGNED__;
+       spinlock_t cons_outbuf_lock;
+
+       /* the buf is used for putting chars to tty */
+       char outbuf[] __ALIGNED__;
  };

pahole for above hvc_struct as below,  is it ok for you?  do we need to 
pack the hole? thanks

struct hvc_struct {
     struct tty_port            port;                 /*     0 352 */
     /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
     spinlock_t                 lock;                 /*   352 4 */
     int                        index;                /*   356 4 */
     int                        do_wakeup;            /*   360 4 */
     int                        outbuf_size;          /*   364 4 */
     int                        n_outbuf;             /*   368 4 */
     uint32_t                   vtermno;              /*   372 4 */
     const struct hv_ops  *     ops;                  /*   376 8 */
     /* --- cacheline 6 boundary (384 bytes) --- */
     int                        irq_requested;        /*   384 4 */
     int                        data;                 /*   388 4 */
     struct winsize             ws;                   /*   392 8 */
     struct work_struct         tty_resize;           /*   400 32 */
     struct list_head           next;                 /*   432 16 */
     /* --- cacheline 7 boundary (448 bytes) --- */
     long unsigned int          flags;                /*   448 8 */

     /* XXX 56 bytes hole, try to pack */

     /* --- cacheline 8 boundary (512 bytes) --- */
     char                       cons_outbuf[16];      /*   512 16 */
     spinlock_t                 cons_outbuf_lock;     /*   528 4 */

     /* XXX 44 bytes hole, try to pack */

     /* --- cacheline 9 boundary (576 bytes) --- */
     char                       outbuf[0];            /*   576 0 */

     /* size: 576, cachelines: 9, members: 17 */
     /* sum members: 476, holes: 2, sum holes: 100 */
};


>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-14  8:34           ` Xianting Tian
  0 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-14  8:34 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov


在 2021/10/10 下午1:33, Greg KH 写道:
> On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
>> 在 2021/10/9 下午7:58, Greg KH 写道:
>>> Did you look at the placement using pahole as to how this structure now
>>> looks?
>> thanks for all your commnts. for this one, do you mean I need to remove the
>> blank line?  thanks
>>
> No, I mean to use the tool 'pahole' to see the structure layout that you
> just created and determine if it really is the best way to add these new
> fields, especially as you are adding huge buffers with odd alignment.

thanks,

Based on your comments, I removed 'char outchar',  remian the position 
of 'int outbuf_size' unchanged to keep outbuf_size and lock in the same 
cache line.  Now hvc_struct change as below,

  struct hvc_struct {
         struct tty_port port;
         spinlock_t lock;
         int index;
         int do_wakeup;
-       char *outbuf;
         int outbuf_size;
         int n_outbuf;
         uint32_t vtermno;
@@ -48,6 +57,16 @@ struct hvc_struct {
         struct work_struct tty_resize;
         struct list_head next;
         unsigned long flags;
+
+       /*
+        * the buf is used in hvc console api for putting chars,
+        * and also used in hvc_poll_put_char() for putting single char.
+        */
+       char cons_outbuf[N_OUTBUF] __ALIGNED__;
+       spinlock_t cons_outbuf_lock;
+
+       /* the buf is used for putting chars to tty */
+       char outbuf[] __ALIGNED__;
  };

pahole for above hvc_struct as below,  is it ok for you?  do we need to 
pack the hole? thanks

struct hvc_struct {
     struct tty_port            port;                 /*     0 352 */
     /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
     spinlock_t                 lock;                 /*   352 4 */
     int                        index;                /*   356 4 */
     int                        do_wakeup;            /*   360 4 */
     int                        outbuf_size;          /*   364 4 */
     int                        n_outbuf;             /*   368 4 */
     uint32_t                   vtermno;              /*   372 4 */
     const struct hv_ops  *     ops;                  /*   376 8 */
     /* --- cacheline 6 boundary (384 bytes) --- */
     int                        irq_requested;        /*   384 4 */
     int                        data;                 /*   388 4 */
     struct winsize             ws;                   /*   392 8 */
     struct work_struct         tty_resize;           /*   400 32 */
     struct list_head           next;                 /*   432 16 */
     /* --- cacheline 7 boundary (448 bytes) --- */
     long unsigned int          flags;                /*   448 8 */

     /* XXX 56 bytes hole, try to pack */

     /* --- cacheline 8 boundary (512 bytes) --- */
     char                       cons_outbuf[16];      /*   512 16 */
     spinlock_t                 cons_outbuf_lock;     /*   528 4 */

     /* XXX 44 bytes hole, try to pack */

     /* --- cacheline 9 boundary (576 bytes) --- */
     char                       outbuf[0];            /*   576 0 */

     /* size: 576, cachelines: 9, members: 17 */
     /* sum members: 476, holes: 2, sum holes: 100 */
};


>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
  2021-10-14  8:34           ` Xianting Tian
  (?)
@ 2021-10-14  8:41             ` Greg KH
  -1 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-14  8:41 UTC (permalink / raw)
  To: Xianting Tian
  Cc: jirislaby, amit, arnd, osandov, shile.zhang, linuxppc-dev,
	virtualization, linux-kernel

On Thu, Oct 14, 2021 at 04:34:59PM +0800, Xianting Tian wrote:
> 
> 在 2021/10/10 下午1:33, Greg KH 写道:
> > On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
> > > 在 2021/10/9 下午7:58, Greg KH 写道:
> > > > Did you look at the placement using pahole as to how this structure now
> > > > looks?
> > > thanks for all your commnts. for this one, do you mean I need to remove the
> > > blank line?  thanks
> > > 
> > No, I mean to use the tool 'pahole' to see the structure layout that you
> > just created and determine if it really is the best way to add these new
> > fields, especially as you are adding huge buffers with odd alignment.
> 
> thanks,
> 
> Based on your comments, I removed 'char outchar',  remian the position of
> 'int outbuf_size' unchanged to keep outbuf_size and lock in the same cache
> line.  Now hvc_struct change as below,
> 
>  struct hvc_struct {
>         struct tty_port port;
>         spinlock_t lock;
>         int index;
>         int do_wakeup;
> -       char *outbuf;
>         int outbuf_size;
>         int n_outbuf;
>         uint32_t vtermno;
> @@ -48,6 +57,16 @@ struct hvc_struct {
>         struct work_struct tty_resize;
>         struct list_head next;
>         unsigned long flags;
> +
> +       /*
> +        * the buf is used in hvc console api for putting chars,
> +        * and also used in hvc_poll_put_char() for putting single char.
> +        */
> +       char cons_outbuf[N_OUTBUF] __ALIGNED__;
> +       spinlock_t cons_outbuf_lock;
> +
> +       /* the buf is used for putting chars to tty */
> +       char outbuf[] __ALIGNED__;
>  };
> 
> pahole for above hvc_struct as below,  is it ok for you?  do we need to pack
> the hole? thanks
> 
> struct hvc_struct {
>     struct tty_port            port;                 /*     0 352 */
>     /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
>     spinlock_t                 lock;                 /*   352 4 */
>     int                        index;                /*   356 4 */
>     int                        do_wakeup;            /*   360 4 */
>     int                        outbuf_size;          /*   364 4 */
>     int                        n_outbuf;             /*   368 4 */
>     uint32_t                   vtermno;              /*   372 4 */
>     const struct hv_ops  *     ops;                  /*   376 8 */
>     /* --- cacheline 6 boundary (384 bytes) --- */
>     int                        irq_requested;        /*   384 4 */
>     int                        data;                 /*   388 4 */
>     struct winsize             ws;                   /*   392 8 */
>     struct work_struct         tty_resize;           /*   400 32 */
>     struct list_head           next;                 /*   432 16 */
>     /* --- cacheline 7 boundary (448 bytes) --- */
>     long unsigned int          flags;                /*   448 8 */
> 
>     /* XXX 56 bytes hole, try to pack */
> 
>     /* --- cacheline 8 boundary (512 bytes) --- */
>     char                       cons_outbuf[16];      /*   512 16 */
>     spinlock_t                 cons_outbuf_lock;     /*   528 4 */
> 
>     /* XXX 44 bytes hole, try to pack */

Why not move the spinlock up above the cons_outbuf?  Will that not be a
bit better?

Anyway, this is all fine, and much better than before, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-14  8:41             ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-14  8:41 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov

On Thu, Oct 14, 2021 at 04:34:59PM +0800, Xianting Tian wrote:
> 
> 在 2021/10/10 下午1:33, Greg KH 写道:
> > On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
> > > 在 2021/10/9 下午7:58, Greg KH 写道:
> > > > Did you look at the placement using pahole as to how this structure now
> > > > looks?
> > > thanks for all your commnts. for this one, do you mean I need to remove the
> > > blank line?  thanks
> > > 
> > No, I mean to use the tool 'pahole' to see the structure layout that you
> > just created and determine if it really is the best way to add these new
> > fields, especially as you are adding huge buffers with odd alignment.
> 
> thanks,
> 
> Based on your comments, I removed 'char outchar',  remian the position of
> 'int outbuf_size' unchanged to keep outbuf_size and lock in the same cache
> line.  Now hvc_struct change as below,
> 
>  struct hvc_struct {
>         struct tty_port port;
>         spinlock_t lock;
>         int index;
>         int do_wakeup;
> -       char *outbuf;
>         int outbuf_size;
>         int n_outbuf;
>         uint32_t vtermno;
> @@ -48,6 +57,16 @@ struct hvc_struct {
>         struct work_struct tty_resize;
>         struct list_head next;
>         unsigned long flags;
> +
> +       /*
> +        * the buf is used in hvc console api for putting chars,
> +        * and also used in hvc_poll_put_char() for putting single char.
> +        */
> +       char cons_outbuf[N_OUTBUF] __ALIGNED__;
> +       spinlock_t cons_outbuf_lock;
> +
> +       /* the buf is used for putting chars to tty */
> +       char outbuf[] __ALIGNED__;
>  };
> 
> pahole for above hvc_struct as below,  is it ok for you?  do we need to pack
> the hole? thanks
> 
> struct hvc_struct {
>     struct tty_port            port;                 /*     0 352 */
>     /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
>     spinlock_t                 lock;                 /*   352 4 */
>     int                        index;                /*   356 4 */
>     int                        do_wakeup;            /*   360 4 */
>     int                        outbuf_size;          /*   364 4 */
>     int                        n_outbuf;             /*   368 4 */
>     uint32_t                   vtermno;              /*   372 4 */
>     const struct hv_ops  *     ops;                  /*   376 8 */
>     /* --- cacheline 6 boundary (384 bytes) --- */
>     int                        irq_requested;        /*   384 4 */
>     int                        data;                 /*   388 4 */
>     struct winsize             ws;                   /*   392 8 */
>     struct work_struct         tty_resize;           /*   400 32 */
>     struct list_head           next;                 /*   432 16 */
>     /* --- cacheline 7 boundary (448 bytes) --- */
>     long unsigned int          flags;                /*   448 8 */
> 
>     /* XXX 56 bytes hole, try to pack */
> 
>     /* --- cacheline 8 boundary (512 bytes) --- */
>     char                       cons_outbuf[16];      /*   512 16 */
>     spinlock_t                 cons_outbuf_lock;     /*   528 4 */
> 
>     /* XXX 44 bytes hole, try to pack */

Why not move the spinlock up above the cons_outbuf?  Will that not be a
bit better?

Anyway, this is all fine, and much better than before, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-14  8:41             ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2021-10-14  8:41 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov

On Thu, Oct 14, 2021 at 04:34:59PM +0800, Xianting Tian wrote:
> 
> 在 2021/10/10 下午1:33, Greg KH 写道:
> > On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
> > > 在 2021/10/9 下午7:58, Greg KH 写道:
> > > > Did you look at the placement using pahole as to how this structure now
> > > > looks?
> > > thanks for all your commnts. for this one, do you mean I need to remove the
> > > blank line?  thanks
> > > 
> > No, I mean to use the tool 'pahole' to see the structure layout that you
> > just created and determine if it really is the best way to add these new
> > fields, especially as you are adding huge buffers with odd alignment.
> 
> thanks,
> 
> Based on your comments, I removed 'char outchar',  remian the position of
> 'int outbuf_size' unchanged to keep outbuf_size and lock in the same cache
> line.  Now hvc_struct change as below,
> 
>  struct hvc_struct {
>         struct tty_port port;
>         spinlock_t lock;
>         int index;
>         int do_wakeup;
> -       char *outbuf;
>         int outbuf_size;
>         int n_outbuf;
>         uint32_t vtermno;
> @@ -48,6 +57,16 @@ struct hvc_struct {
>         struct work_struct tty_resize;
>         struct list_head next;
>         unsigned long flags;
> +
> +       /*
> +        * the buf is used in hvc console api for putting chars,
> +        * and also used in hvc_poll_put_char() for putting single char.
> +        */
> +       char cons_outbuf[N_OUTBUF] __ALIGNED__;
> +       spinlock_t cons_outbuf_lock;
> +
> +       /* the buf is used for putting chars to tty */
> +       char outbuf[] __ALIGNED__;
>  };
> 
> pahole for above hvc_struct as below,  is it ok for you?  do we need to pack
> the hole? thanks
> 
> struct hvc_struct {
>     struct tty_port            port;                 /*     0 352 */
>     /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
>     spinlock_t                 lock;                 /*   352 4 */
>     int                        index;                /*   356 4 */
>     int                        do_wakeup;            /*   360 4 */
>     int                        outbuf_size;          /*   364 4 */
>     int                        n_outbuf;             /*   368 4 */
>     uint32_t                   vtermno;              /*   372 4 */
>     const struct hv_ops  *     ops;                  /*   376 8 */
>     /* --- cacheline 6 boundary (384 bytes) --- */
>     int                        irq_requested;        /*   384 4 */
>     int                        data;                 /*   388 4 */
>     struct winsize             ws;                   /*   392 8 */
>     struct work_struct         tty_resize;           /*   400 32 */
>     struct list_head           next;                 /*   432 16 */
>     /* --- cacheline 7 boundary (448 bytes) --- */
>     long unsigned int          flags;                /*   448 8 */
> 
>     /* XXX 56 bytes hole, try to pack */
> 
>     /* --- cacheline 8 boundary (512 bytes) --- */
>     char                       cons_outbuf[16];      /*   512 16 */
>     spinlock_t                 cons_outbuf_lock;     /*   528 4 */
> 
>     /* XXX 44 bytes hole, try to pack */

Why not move the spinlock up above the cons_outbuf?  Will that not be a
bit better?

Anyway, this is all fine, and much better than before, thanks.

greg k-h
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
  2021-10-14  8:41             ` Greg KH
@ 2021-10-14  8:56               ` Xianting Tian
  -1 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-14  8:56 UTC (permalink / raw)
  To: Greg KH
  Cc: jirislaby, amit, arnd, osandov, shile.zhang, linuxppc-dev,
	virtualization, linux-kernel


在 2021/10/14 下午4:41, Greg KH 写道:
> On Thu, Oct 14, 2021 at 04:34:59PM +0800, Xianting Tian wrote:
>> 在 2021/10/10 下午1:33, Greg KH 写道:
>>> On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
>>>> 在 2021/10/9 下午7:58, Greg KH 写道:
>>>>> Did you look at the placement using pahole as to how this structure now
>>>>> looks?
>>>> thanks for all your commnts. for this one, do you mean I need to remove the
>>>> blank line?  thanks
>>>>
>>> No, I mean to use the tool 'pahole' to see the structure layout that you
>>> just created and determine if it really is the best way to add these new
>>> fields, especially as you are adding huge buffers with odd alignment.
>> thanks,
>>
>> Based on your comments, I removed 'char outchar',  remian the position of
>> 'int outbuf_size' unchanged to keep outbuf_size and lock in the same cache
>> line.  Now hvc_struct change as below,
>>
>>   struct hvc_struct {
>>          struct tty_port port;
>>          spinlock_t lock;
>>          int index;
>>          int do_wakeup;
>> -       char *outbuf;
>>          int outbuf_size;
>>          int n_outbuf;
>>          uint32_t vtermno;
>> @@ -48,6 +57,16 @@ struct hvc_struct {
>>          struct work_struct tty_resize;
>>          struct list_head next;
>>          unsigned long flags;
>> +
>> +       /*
>> +        * the buf is used in hvc console api for putting chars,
>> +        * and also used in hvc_poll_put_char() for putting single char.
>> +        */
>> +       char cons_outbuf[N_OUTBUF] __ALIGNED__;
>> +       spinlock_t cons_outbuf_lock;
>> +
>> +       /* the buf is used for putting chars to tty */
>> +       char outbuf[] __ALIGNED__;
>>   };
>>
>> pahole for above hvc_struct as below,  is it ok for you?  do we need to pack
>> the hole? thanks
>>
>> struct hvc_struct {
>>      struct tty_port            port;                 /*     0 352 */
>>      /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
>>      spinlock_t                 lock;                 /*   352 4 */
>>      int                        index;                /*   356 4 */
>>      int                        do_wakeup;            /*   360 4 */
>>      int                        outbuf_size;          /*   364 4 */
>>      int                        n_outbuf;             /*   368 4 */
>>      uint32_t                   vtermno;              /*   372 4 */
>>      const struct hv_ops  *     ops;                  /*   376 8 */
>>      /* --- cacheline 6 boundary (384 bytes) --- */
>>      int                        irq_requested;        /*   384 4 */
>>      int                        data;                 /*   388 4 */
>>      struct winsize             ws;                   /*   392 8 */
>>      struct work_struct         tty_resize;           /*   400 32 */
>>      struct list_head           next;                 /*   432 16 */
>>      /* --- cacheline 7 boundary (448 bytes) --- */
>>      long unsigned int          flags;                /*   448 8 */
>>
>>      /* XXX 56 bytes hole, try to pack */
>>
>>      /* --- cacheline 8 boundary (512 bytes) --- */
>>      char                       cons_outbuf[16];      /*   512 16 */
>>      spinlock_t                 cons_outbuf_lock;     /*   528 4 */
>>
>>      /* XXX 44 bytes hole, try to pack */
> Why not move the spinlock up above the cons_outbuf?  Will that not be a
> bit better?
thanks, I will move it avove cons_outbuf, and send v11 patches soon.
>
> Anyway, this is all fine, and much better than before, thanks.
>
> greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars()
@ 2021-10-14  8:56               ` Xianting Tian
  0 siblings, 0 replies; 26+ messages in thread
From: Xianting Tian @ 2021-10-14  8:56 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov


在 2021/10/14 下午4:41, Greg KH 写道:
> On Thu, Oct 14, 2021 at 04:34:59PM +0800, Xianting Tian wrote:
>> 在 2021/10/10 下午1:33, Greg KH 写道:
>>> On Sat, Oct 09, 2021 at 11:45:23PM +0800, Xianting Tian wrote:
>>>> 在 2021/10/9 下午7:58, Greg KH 写道:
>>>>> Did you look at the placement using pahole as to how this structure now
>>>>> looks?
>>>> thanks for all your commnts. for this one, do you mean I need to remove the
>>>> blank line?  thanks
>>>>
>>> No, I mean to use the tool 'pahole' to see the structure layout that you
>>> just created and determine if it really is the best way to add these new
>>> fields, especially as you are adding huge buffers with odd alignment.
>> thanks,
>>
>> Based on your comments, I removed 'char outchar',  remian the position of
>> 'int outbuf_size' unchanged to keep outbuf_size and lock in the same cache
>> line.  Now hvc_struct change as below,
>>
>>   struct hvc_struct {
>>          struct tty_port port;
>>          spinlock_t lock;
>>          int index;
>>          int do_wakeup;
>> -       char *outbuf;
>>          int outbuf_size;
>>          int n_outbuf;
>>          uint32_t vtermno;
>> @@ -48,6 +57,16 @@ struct hvc_struct {
>>          struct work_struct tty_resize;
>>          struct list_head next;
>>          unsigned long flags;
>> +
>> +       /*
>> +        * the buf is used in hvc console api for putting chars,
>> +        * and also used in hvc_poll_put_char() for putting single char.
>> +        */
>> +       char cons_outbuf[N_OUTBUF] __ALIGNED__;
>> +       spinlock_t cons_outbuf_lock;
>> +
>> +       /* the buf is used for putting chars to tty */
>> +       char outbuf[] __ALIGNED__;
>>   };
>>
>> pahole for above hvc_struct as below,  is it ok for you?  do we need to pack
>> the hole? thanks
>>
>> struct hvc_struct {
>>      struct tty_port            port;                 /*     0 352 */
>>      /* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
>>      spinlock_t                 lock;                 /*   352 4 */
>>      int                        index;                /*   356 4 */
>>      int                        do_wakeup;            /*   360 4 */
>>      int                        outbuf_size;          /*   364 4 */
>>      int                        n_outbuf;             /*   368 4 */
>>      uint32_t                   vtermno;              /*   372 4 */
>>      const struct hv_ops  *     ops;                  /*   376 8 */
>>      /* --- cacheline 6 boundary (384 bytes) --- */
>>      int                        irq_requested;        /*   384 4 */
>>      int                        data;                 /*   388 4 */
>>      struct winsize             ws;                   /*   392 8 */
>>      struct work_struct         tty_resize;           /*   400 32 */
>>      struct list_head           next;                 /*   432 16 */
>>      /* --- cacheline 7 boundary (448 bytes) --- */
>>      long unsigned int          flags;                /*   448 8 */
>>
>>      /* XXX 56 bytes hole, try to pack */
>>
>>      /* --- cacheline 8 boundary (512 bytes) --- */
>>      char                       cons_outbuf[16];      /*   512 16 */
>>      spinlock_t                 cons_outbuf_lock;     /*   528 4 */
>>
>>      /* XXX 44 bytes hole, try to pack */
> Why not move the spinlock up above the cons_outbuf?  Will that not be a
> bit better?
thanks, I will move it avove cons_outbuf, and send v11 patches soon.
>
> Anyway, this is all fine, and much better than before, thanks.
>
> greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-10-14  8:56 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09 11:48 [PATCH v10 0/3] make hvc pass dma capable memory to its backend Xianting Tian
2021-10-09 11:48 ` Xianting Tian
2021-10-09 11:48 ` [PATCH v10 1/3] tty: hvc: use correct dma alignment size Xianting Tian
2021-10-09 11:48   ` Xianting Tian
2021-10-09 11:48 ` [PATCH v10 2/3] tty: hvc: pass DMA capable memory to put_chars() Xianting Tian
2021-10-09 11:48   ` Xianting Tian
2021-10-09 11:55   ` Greg KH
2021-10-09 11:55     ` Greg KH
2021-10-09 11:55     ` Greg KH
2021-10-09 11:58   ` Greg KH
2021-10-09 11:58     ` Greg KH
2021-10-09 11:58     ` Greg KH
2021-10-09 15:45     ` Xianting Tian
2021-10-09 15:45       ` Xianting Tian
2021-10-10  5:33       ` Greg KH
2021-10-10  5:33         ` Greg KH
2021-10-10  5:33         ` Greg KH
2021-10-14  8:34         ` Xianting Tian
2021-10-14  8:34           ` Xianting Tian
2021-10-14  8:41           ` Greg KH
2021-10-14  8:41             ` Greg KH
2021-10-14  8:41             ` Greg KH
2021-10-14  8:56             ` Xianting Tian
2021-10-14  8:56               ` Xianting Tian
2021-10-09 11:48 ` [PATCH v10 3/3] virtio-console: remove unnecessary kmemdup() Xianting Tian
2021-10-09 11:48   ` Xianting Tian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.