Possible bug? DOM-U network stopped working after fatal error reported in DOM0

All of lore.kernel.org
 help / color / mirror / Atom feed

* Possible bug? DOM-U network stopped working after fatal error reported in DOM0
@ 2021-12-18 18:35 G.R.
  2021-12-19  6:10 ` Juergen Gross
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: G.R. @ 2021-12-18 18:35 UTC (permalink / raw)
  To: xen-devel

Hi all,

I ran into the following error report in the DOM0 kernel after a recent upgrade:
[  501.840816] vif vif-1-0 vif1.0: Cross page boundary, txp->offset:
2872, size: 1460
[  501.840828] vif vif-1-0 vif1.0: fatal error; disabling device
[  501.841076] xenbr0: port 2(vif1.0) entered disabled state
Once this error happens, the DOM-U behind this vif is no-longer
accessible. And recreating the same DOM-U does not fix the problem.
Only a reboot on the physical host machine helps.

The problem showed up after a recent upgrade on the DOM-U OS from
FreeNAS 11.3 to TrueNAS 12.0U7 and breaks the iSCSI service while
leaving other services like NFS intact.
The underlying OS for the NAS is FreeBSD, version 11.3 and 12.2 respectively.
So far I have tried the following combos:
- Linux 4.19 DOM0 + XEN 4.8 + FreeBSD 11.3 DOM-U: Good
- Linux 4.19 DOM0 + XEN 4.8 + FreeBSD 12.2 DOM-U: Regressed
- Linux 5.10 DOM0 + XEN 4.8 + FreeBSD 12.2 DOM-U: Regressed
- Linux 5.10 DOM0 + XEN 4.11 + FreeBSD 12.2 DOM-U: Regressed

I plan to try out the XEN 4.14 version which is the latest I can get
from the distro (Debian).
If that still does not fix the problem, I would build the 4.16 version
from source as my last resort.

I have to admit that this trial process is blind as I have no idea
which component in the combo is to be blamed. Is it a bug in the
backend-driver, frontend-driver or the hypervisor itself? Or due to
incompatible versions? Any suggestion on other diagnose ideas (e.g.
debug logs) will be welcome, while I work on the planned experiments.

Thanks,
G.R.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-18 18:35 Possible bug? DOM-U network stopped working after fatal error reported in DOM0 G.R.
@ 2021-12-19  6:10 ` Juergen Gross
  2021-12-19 17:31 ` G.R.
  2021-12-20 13:51 ` Roger Pau Monné
  2 siblings, 0 replies; 33+ messages in thread
From: Juergen Gross @ 2021-12-19  6:10 UTC (permalink / raw)
  To: G.R., xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1980 bytes --]

On 18.12.21 19:35, G.R. wrote:
> Hi all,
> 
> I ran into the following error report in the DOM0 kernel after a recent upgrade:
> [  501.840816] vif vif-1-0 vif1.0: Cross page boundary, txp->offset:
> 2872, size: 1460
> [  501.840828] vif vif-1-0 vif1.0: fatal error; disabling device

The dom0 network backend has detected an inconsistency in the data
received from the domU's frontend. In this case a request's memory
buffer crossed a page boundary, which is not allowed.

There has been a recent change in the xen netback driver to stop the
interface in such conditions, as such invalid requests are regarded to
be malicious and might lead to crashes in dom0.

So this issue should be reported to FreeBSD maintainers in order to
have the Xen netfornt driver fixed there.

> [  501.841076] xenbr0: port 2(vif1.0) entered disabled state
> Once this error happens, the DOM-U behind this vif is no-longer
> accessible. And recreating the same DOM-U does not fix the problem.
> Only a reboot on the physical host machine helps.
> 
> The problem showed up after a recent upgrade on the DOM-U OS from
> FreeNAS 11.3 to TrueNAS 12.0U7 and breaks the iSCSI service while
> leaving other services like NFS intact.
> The underlying OS for the NAS is FreeBSD, version 11.3 and 12.2 respectively.
> So far I have tried the following combos:
> - Linux 4.19 DOM0 + XEN 4.8 + FreeBSD 11.3 DOM-U: Good
> - Linux 4.19 DOM0 + XEN 4.8 + FreeBSD 12.2 DOM-U: Regressed
> - Linux 5.10 DOM0 + XEN 4.8 + FreeBSD 12.2 DOM-U: Regressed
> - Linux 5.10 DOM0 + XEN 4.11 + FreeBSD 12.2 DOM-U: Regressed

This information (especially the FreeBSD version affected) is
probably important for the FreeBSD maintainers.

> 
> I plan to try out the XEN 4.14 version which is the latest I can get
> from the distro (Debian).
> If that still does not fix the problem, I would build the 4.16 version
> from source as my last resort.

Xen is NOT to blame here.


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-18 18:35 Possible bug? DOM-U network stopped working after fatal error reported in DOM0 G.R.
  2021-12-19  6:10 ` Juergen Gross
@ 2021-12-19 17:31 ` G.R.
  2021-12-20 17:13   ` G.R.
  2021-12-20 13:51 ` Roger Pau Monné
  2 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-19 17:31 UTC (permalink / raw)
  To: xen-devel

On Sun, Dec 19, 2021 at 2:35 AM G.R. <firemeteor@users.sourceforge.net> wrote:
>
> Hi all,
>
> I ran into the following error report in the DOM0 kernel after a recent upgrade:
> [  501.840816] vif vif-1-0 vif1.0: Cross page boundary, txp->offset:
> 2872, size: 1460
> [  501.840828] vif vif-1-0 vif1.0: fatal error; disabling device
> [  501.841076] xenbr0: port 2(vif1.0) entered disabled state
> Once this error happens, the DOM-U behind this vif is no-longer
> accessible. And recreating the same DOM-U does not fix the problem.
> Only a reboot on the physical host machine helps.
>
> The problem showed up after a recent upgrade on the DOM-U OS from
> FreeNAS 11.3 to TrueNAS 12.0U7 and breaks the iSCSI service while
> leaving other services like NFS intact.
To clarify -- mounting iSCSI disk will cause the problem immediately.

> The underlying OS for the NAS is FreeBSD, version 11.3 and 12.2 respectively.
> So far I have tried the following combos:
> - Linux 4.19 DOM0 + XEN 4.8 + FreeBSD 11.3 DOM-U: Good
> - Linux 4.19 DOM0 + XEN 4.8 + FreeBSD 12.2 DOM-U: Regressed
> - Linux 5.10 DOM0 + XEN 4.8 + FreeBSD 12.2 DOM-U: Regressed
> - Linux 5.10 DOM0 + XEN 4.11 + FreeBSD 12.2 DOM-U: Regressed
- Linux 5.10 DOM0 + XEN 4.14 + FreeBSD 12.2 DOM-U: Regressed
>
> I plan to try out the XEN 4.14 version which is the latest I can get
> from the distro (Debian).
I just upgraded to Debian bullseye (11) from buster (10) and migrated
to XEN4.14 as a result.
The syndrome persists, unfortunately.
BTW, my Dom0 kernel is a custom built version. Does any kernel config
could contribute to this problem?

> If that still does not fix the problem, I would build the 4.16 version
> from source as my last resort.
>
> I have to admit that this trial process is blind as I have no idea
> which component in the combo is to be blamed. Is it a bug in the
> backend-driver, frontend-driver or the hypervisor itself? Or due to
> incompatible versions? Any suggestion on other diagnose ideas (e.g.
> debug logs) will be welcome, while I work on the planned experiments.
>
> Thanks,
> G.R.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-19 17:31 ` G.R.
@ 2021-12-20 17:13   ` G.R.
  2021-12-21 13:50     ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-20 17:13 UTC (permalink / raw)
  To: xen-devel

First of all, thank you for your quick response, Juergen and Roger.
I just realized that I run into mail forwarding issue from sourceforge
mail alias service, and only found the responses when I checked the
list archive. As a result, I have to manually merge Roger's response
to reply...

> > I have to admit that this trial process is blind as I have no idea
> > which component in the combo is to be blamed. Is it a bug in the
> > backend-driver, frontend-driver or the hypervisor itself? Or due to
> > incompatible versions? Any suggestion on other diagnose ideas (e.g.
> > debug logs) will be welcome, while I work on the planned experiments.
>
> This is a bug in FreeBSD netfront, so no matter which Linux or Xen
> version you use.
>
> Does it make a difference if you disable TSO and LRO from netfront?
>
> $ ifconfig xn0 -tso -lro
It does not, the fatal error still show up after this command.

>
> Do you have instructions I can follow in order to try to reproduce the
> issue?
I don't know if there are any special details in my setup.
Hopefully I don't miss anything useful:
1. Build a TrueNAS 12.0U7 DOM-U by flushing the OS image into a vdisk
2. Create / import a zfs pool to the DOM-U
3. Create and share some file based iSCSI extents on the pool
4. Mount the iSCSI extent through some initiator clients.
The domU xn0 should be disabled immediately after step #4.

I omitted all operational details with the assumption that you are familiar
with TrueNAS and iSCSI setup.
For step #4, I can reproduce it with both ipxe initiator and the win7
built-in client.
As a result, I assume the client version does not matter.
For #2, I actually have a physical disk and controller assigned to DOM-U.
But I suspect this is probably irrelevant.
For #3, I'm not sure if the content in the extent matters.
So far I have been testing the same extent, which is formatted as an NTFS disk.

>
> Thanks, Roger.

> >
> > Thanks,
> > G.R.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-20 17:13   ` G.R.
@ 2021-12-21 13:50     ` Roger Pau Monné
  2021-12-21 18:19       ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-21 13:50 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Tue, Dec 21, 2021 at 01:13:43AM +0800, G.R. wrote:
> First of all, thank you for your quick response, Juergen and Roger.
> I just realized that I run into mail forwarding issue from sourceforge
> mail alias service, and only found the responses when I checked the
> list archive. As a result, I have to manually merge Roger's response
> to reply...
> 
> > > I have to admit that this trial process is blind as I have no idea
> > > which component in the combo is to be blamed. Is it a bug in the
> > > backend-driver, frontend-driver or the hypervisor itself? Or due to
> > > incompatible versions? Any suggestion on other diagnose ideas (e.g.
> > > debug logs) will be welcome, while I work on the planned experiments.
> >
> > This is a bug in FreeBSD netfront, so no matter which Linux or Xen
> > version you use.
> >
> > Does it make a difference if you disable TSO and LRO from netfront?
> >
> > $ ifconfig xn0 -tso -lro
> It does not, the fatal error still show up after this command.

Thanks for testing.

> >
> > Do you have instructions I can follow in order to try to reproduce the
> > issue?
> I don't know if there are any special details in my setup.
> Hopefully I don't miss anything useful:
> 1. Build a TrueNAS 12.0U7 DOM-U by flushing the OS image into a vdisk
> 2. Create / import a zfs pool to the DOM-U
> 3. Create and share some file based iSCSI extents on the pool
> 4. Mount the iSCSI extent through some initiator clients.
> The domU xn0 should be disabled immediately after step #4.
> 
> I omitted all operational details with the assumption that you are familiar
> with TrueNAS and iSCSI setup.

Not really. Ideally I would like a way to reproduce that can be done
using iperf, nc or similar simple command line tool, without requiring
to setup iSCSI.

> For step #4, I can reproduce it with both ipxe initiator and the win7
> built-in client.
> As a result, I assume the client version does not matter.
> For #2, I actually have a physical disk and controller assigned to DOM-U.
> But I suspect this is probably irrelevant.
> For #3, I'm not sure if the content in the extent matters.
> So far I have been testing the same extent, which is formatted as an NTFS disk.

Can you also paste the output of `ifconfig xn0`?

If I provided a patch for the FreeBSD kernel, would you be able to
apply and test it?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-21 13:50     ` Roger Pau Monné
@ 2021-12-21 18:19       ` G.R.
  2021-12-21 19:12         ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-21 18:19 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

> > I omitted all operational details with the assumption that you are familiar
> > with TrueNAS and iSCSI setup.
>
> Not really. Ideally I would like a way to reproduce that can be done
> using iperf, nc or similar simple command line tool, without requiring
> to setup iSCSI.
I think it would be tricky then. The problem hide itself well enough
that I wasn't
aware soon after upgrading since everything else works flawlessly --
nfs, ssh, web etc.

> Can you also paste the output of `ifconfig xn0`?
Here it is:
xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=503<RXCSUM,TXCSUM,TSO4,LRO>
    ether 00:18:3c:51:6e:4c
    inet 192.168.1.9 netmask 0xffffff00 broadcast 192.168.1.255
    media: Ethernet manual
    status: active
    nd6 options=1<PERFORMNUD>

>
> If I provided a patch for the FreeBSD kernel, would you be able to
> apply and test it?
Probably. I did this before when your XEN support for freeBSD was not
available out-of-box.
Just need to recreate all the required environments to apply the patch.

BTW, uname -a gives me the following;
>12.2-RELEASE-p11 FreeBSD 12.2-RELEASE-p11 75566f060d4(HEAD) TRUENAS  amd64

Thanks,
Timothy


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-21 18:19       ` G.R.
@ 2021-12-21 19:12         ` Roger Pau Monné
  2021-12-23 15:49           ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-21 19:12 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Wed, Dec 22, 2021 at 02:19:03AM +0800, G.R. wrote:
> > > I omitted all operational details with the assumption that you are familiar
> > > with TrueNAS and iSCSI setup.
> >
> > Not really. Ideally I would like a way to reproduce that can be done
> > using iperf, nc or similar simple command line tool, without requiring
> > to setup iSCSI.
> I think it would be tricky then. The problem hide itself well enough
> that I wasn't
> aware soon after upgrading since everything else works flawlessly --
> nfs, ssh, web etc.
> 
> > Can you also paste the output of `ifconfig xn0`?
> Here it is:
> xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>     options=503<RXCSUM,TXCSUM,TSO4,LRO>
>     ether 00:18:3c:51:6e:4c
>     inet 192.168.1.9 netmask 0xffffff00 broadcast 192.168.1.255
>     media: Ethernet manual
>     status: active
>     nd6 options=1<PERFORMNUD>
> 
> >
> > If I provided a patch for the FreeBSD kernel, would you be able to
> > apply and test it?
> Probably. I did this before when your XEN support for freeBSD was not
> available out-of-box.
> Just need to recreate all the required environments to apply the patch.

Could you build a debug kernel with the following patch applied and
give me the trace when it explodes?

Thanks, Roger.
---
diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c
index fd2d97a7c70c..87bc3ecfc4dd 100644
--- a/sys/dev/xen/netfront/netfront.c
+++ b/sys/dev/xen/netfront/netfront.c
@@ -1519,8 +1519,12 @@ xn_count_frags(struct mbuf *m)
 {
 	int nfrags;
 
-	for (nfrags = 0; m != NULL; m = m->m_next)
+	for (nfrags = 0; m != NULL; m = m->m_next) {
+		KASSERT(
+		   (mtod(m, vm_offset_t) & PAGE_MASK) + m->m_len <= PAGE_SIZE,
+		    ("mbuf fragment crosses a page boundary"));
 		nfrags++;
+	}
 
 	return (nfrags);
 }



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-21 19:12         ` Roger Pau Monné
@ 2021-12-23 15:49           ` G.R.
  2021-12-24 11:24             ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-23 15:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Wed, Dec 22, 2021 at 3:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:

> Could you build a debug kernel with the following patch applied and
> give me the trace when it explodes?

Please find the trace and the kernel CL below.
Note, the domU get stuck into a bootloop with this assertion as the
situation will come back after domU restart and only dom0 reboot can
get the situation back to normal.
The trace I captured below is within the boot loop. I suspect the
initial trigger may look different. Will give it another try soon.

FreeBSD 12.2-RELEASE-p11 #0 c8625d629c3(truenas/12.0-stable)-dirty:
Wed Dec 22 20:26:46 UTC 2021
The repo is here: https://github.com/freenas/os.git

db:0:kdb.enter.default>  bt
Tracing pid 0 tid 101637 td 0xfffff80069cc4000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe009f121460
vpanic() at vpanic+0x197/frame 0xfffffe009f1214b0
panic() at panic+0x43/frame 0xfffffe009f121510
xn_txq_mq_start_locked() at xn_txq_mq_start_locked+0x4c6/frame
0xfffffe009f121580
xn_txq_mq_start() at xn_txq_mq_start+0x84/frame 0xfffffe009f1215b0
ether_output_frame() at ether_output_frame+0xb4/frame 0xfffffe009f1215e0
ether_output() at ether_output+0x6a5/frame 0xfffffe009f121680
ip_output() at ip_output+0x1319/frame 0xfffffe009f1217e0
tcp_output() at tcp_output+0x1dbf/frame 0xfffffe009f121980
tcp_usr_send() at tcp_usr_send+0x3c9/frame 0xfffffe009f121a40
sosend_generic() at sosend_generic+0x440/frame 0xfffffe009f121af0
sosend() at sosend+0x66/frame 0xfffffe009f121b20
icl_send_thread() at icl_send_thread+0x44e/frame 0xfffffe009f121bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe009f121bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009f121bf0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-23 15:49           ` G.R.
@ 2021-12-24 11:24             ` Roger Pau Monné
  2021-12-25 16:39               ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-24 11:24 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Thu, Dec 23, 2021 at 11:49:08PM +0800, G.R. wrote:
> On Wed, Dec 22, 2021 at 3:13 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> > Could you build a debug kernel with the following patch applied and
> > give me the trace when it explodes?
> 
> Please find the trace and the kernel CL below.
> Note, the domU get stuck into a bootloop with this assertion as the
> situation will come back after domU restart and only dom0 reboot can
> get the situation back to normal.
> The trace I captured below is within the boot loop. I suspect the
> initial trigger may look different. Will give it another try soon.
> 
> FreeBSD 12.2-RELEASE-p11 #0 c8625d629c3(truenas/12.0-stable)-dirty:
> Wed Dec 22 20:26:46 UTC 2021
> The repo is here: https://github.com/freenas/os.git
> 
> db:0:kdb.enter.default>  bt
> Tracing pid 0 tid 101637 td 0xfffff80069cc4000
> kdb_enter() at kdb_enter+0x37/frame 0xfffffe009f121460
> vpanic() at vpanic+0x197/frame 0xfffffe009f1214b0
> panic() at panic+0x43/frame 0xfffffe009f121510
> xn_txq_mq_start_locked() at xn_txq_mq_start_locked+0x4c6/frame
> 0xfffffe009f121580
> xn_txq_mq_start() at xn_txq_mq_start+0x84/frame 0xfffffe009f1215b0
> ether_output_frame() at ether_output_frame+0xb4/frame 0xfffffe009f1215e0
> ether_output() at ether_output+0x6a5/frame 0xfffffe009f121680
> ip_output() at ip_output+0x1319/frame 0xfffffe009f1217e0
> tcp_output() at tcp_output+0x1dbf/frame 0xfffffe009f121980
> tcp_usr_send() at tcp_usr_send+0x3c9/frame 0xfffffe009f121a40
> sosend_generic() at sosend_generic+0x440/frame 0xfffffe009f121af0
> sosend() at sosend+0x66/frame 0xfffffe009f121b20
> icl_send_thread() at icl_send_thread+0x44e/frame 0xfffffe009f121bb0
> fork_exit() at fork_exit+0x80/frame 0xfffffe009f121bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009f121bf0

Thanks. I've raised this on freensd-net for advice [0]. IMO netfront
shouldn't receive an mbuf that crosses a page boundary, but if that's
indeed a legit mbuf I will figure out the best way to handle it.

I have a clumsy patch (below) that might solve this, if you want to
give it a try.

Regards, Roger.

[0] https://lists.freebsd.org/archives/freebsd-net/2021-December/001179.html
---
diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c
index 87bc3ecfc4dd..c8f807778b75 100644
--- a/sys/dev/xen/netfront/netfront.c
+++ b/sys/dev/xen/netfront/netfront.c
@@ -1529,6 +1529,35 @@ xn_count_frags(struct mbuf *m)
 	return (nfrags);
 }
 
+static inline int fragment(struct mbuf *m)
+{
+	while (m != NULL) {
+		vm_offset_t offset = mtod(m, vm_offset_t) & PAGE_MASK;
+
+		if (offset + m->m_len > PAGE_SIZE) {
+			/* Split mbuf because it crosses a page boundary. */
+			struct mbuf *m_new = m_getcl(M_NOWAIT, MT_DATA, 0);
+
+			if (m_new == NULL)
+				return (ENOMEM);
+
+			m_copydata(m, 0, m->m_len - (PAGE_SIZE - offset),
+			    mtod(m_new, caddr_t));
+
+			/* Set adjusted mbuf sizes. */
+			m_new->m_len = m->m_len - (PAGE_SIZE - offset);
+			m->m_len = PAGE_SIZE - offset;
+
+			/* Insert new mbuf into chain. */
+			m_new->m_next = m->m_next;
+			m->m_next = m_new;
+		}
+		m = m->m_next;
+	}
+
+	return (0);
+}
+
 /**
  * Given an mbuf chain, make sure we have enough room and then push
  * it onto the transmit ring.
@@ -1541,6 +1570,12 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 	struct ifnet *ifp = np->xn_ifp;
 	u_int nfrags;
 	int otherend_id;
+	int rc;
+
+	/* Fragment if mbuf crosses a page boundary. */
+	rc = fragment(m_head);
+	if (rc != 0)
+		return (rc);
 
 	/**
 	 * Defragment the mbuf if necessary.



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-24 11:24             ` Roger Pau Monné
@ 2021-12-25 16:39               ` G.R.
  2021-12-25 18:06                 ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-25 16:39 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

> > Please find the trace and the kernel CL below.
> > Note, the domU get stuck into a bootloop with this assertion as the
> > situation will come back after domU restart and only dom0 reboot can
> > get the situation back to normal.
> > The trace I captured below is within the boot loop. I suspect the
> > initial trigger may look different. Will give it another try soon.
> >
I think I figured out the cause of the boot loop.
It was not due to some mystery offender packet from the FreeBSD domU
surviving across reboot,
but because the windows iSCSI initiator keeps retrying :-).

That said, I did pay some prices figuring this out.
The boot-loop seems to have brought my box into a weird state that those disks
behind the controller keep detaching soon after NAS domU reboot.
Rebooting the physical host does not help this time.
I was almost desperate but thankfully running the NAS on the physical
host directly still works.
And switching it back to domU together with config reloading fixed
this problem :-)
Not sure if it was the domU config being corrupted or something left
sticky in the PCI pass-through?

> Thanks. I've raised this on freensd-net for advice [0]. IMO netfront
> shouldn't receive an mbuf that crosses a page boundary, but if that's
> indeed a legit mbuf I will figure out the best way to handle it.
>
> I have a clumsy patch (below) that might solve this, if you want to
> give it a try.

Applied the patch and it worked like a charm!
Thank you so much for your quick help!
Wish you a wonderful holiday!

Thanks,
G.R.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-25 16:39               ` G.R.
@ 2021-12-25 18:06                 ` G.R.
  2021-12-27 19:04                   ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-25 18:06 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

> > Thanks. I've raised this on freensd-net for advice [0]. IMO netfront
> > shouldn't receive an mbuf that crosses a page boundary, but if that's
> > indeed a legit mbuf I will figure out the best way to handle it.
> >
> > I have a clumsy patch (below) that might solve this, if you want to
> > give it a try.
>
> Applied the patch and it worked like a charm!
> Thank you so much for your quick help!
> Wish you a wonderful holiday!

I may have said too quickly...
With the patch I can attach the iscsi disk and neither the dom0 nor
the NAS domU complains this time.
But when I attempt to mount the attached disk it reports I/O errors randomly.
By randomly I mean different disks behave differently...
I don't see any error logs from kernels this time.
(most of the iscsi disks are NTFS FS and mounted through the user mode
fuse library)
But since I have a local backup copy of the image, I can confirm that
mounting that backup image does not result in any I/O error.
Looks like something is still broken here...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-25 18:06                 ` G.R.
@ 2021-12-27 19:04                   ` Roger Pau Monné
       [not found]                     ` <CAKhsbWY5=vENgwgq3NV44KSZQgpOPY=33CMSZo=jweAcRDjBwg@mail.gmail.com>
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-27 19:04 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Sun, Dec 26, 2021 at 02:06:55AM +0800, G.R. wrote:
> > > Thanks. I've raised this on freensd-net for advice [0]. IMO netfront
> > > shouldn't receive an mbuf that crosses a page boundary, but if that's
> > > indeed a legit mbuf I will figure out the best way to handle it.
> > >
> > > I have a clumsy patch (below) that might solve this, if you want to
> > > give it a try.
> >
> > Applied the patch and it worked like a charm!
> > Thank you so much for your quick help!
> > Wish you a wonderful holiday!
> 
> I may have said too quickly...
> With the patch I can attach the iscsi disk and neither the dom0 nor
> the NAS domU complains this time.
> But when I attempt to mount the attached disk it reports I/O errors randomly.
> By randomly I mean different disks behave differently...
> I don't see any error logs from kernels this time.
> (most of the iscsi disks are NTFS FS and mounted through the user mode
> fuse library)
> But since I have a local backup copy of the image, I can confirm that
> mounting that backup image does not result in any I/O error.
> Looks like something is still broken here...

Indeed. That patch was likely too simple, and didn't properly handle
the split of mbuf data buffers.

I have another version based on using sglist, which I think it's also
a worthwhile change for netfront. Can you please give it a try? I've
done a very simple test and seems fine, but you certainly have more
interesting cases.

You will have to apply it on top of a clean tree, without any of the
other patches applied.

Thanks, Roger.
---
diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c
index 8dba5a8dc6d5..37ea7b1fa059 100644
--- a/sys/dev/xen/netfront/netfront.c
+++ b/sys/dev/xen/netfront/netfront.c
@@ -33,6 +33,8 @@ __FBSDID("$FreeBSD$");
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
+#include <sys/types.h>
+
 #include <sys/param.h>
 #include <sys/sockio.h>
 #include <sys/limits.h>
@@ -40,6 +42,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/malloc.h>
 #include <sys/module.h>
 #include <sys/kernel.h>
+#include <sys/sglist.h>
 #include <sys/socket.h>
 #include <sys/sysctl.h>
 #include <sys/taskqueue.h>
@@ -199,6 +202,12 @@ struct netfront_txq {
 	struct taskqueue 	*tq;
 	struct task       	defrtask;
 
+	struct sglist 		*segments;
+	struct mbuf_refcount {
+		struct m_tag 	tag;
+		u_int 		count;
+	}			refcount_tag[NET_TX_RING_SIZE + 1];
+
 	bool			full;
 };
 
@@ -301,6 +310,38 @@ xn_get_rx_ref(struct netfront_rxq *rxq, RING_IDX ri)
 	return (ref);
 }
 
+#define MTAG_REFCOUNT 0
+
+static void mbuf_grab(uint32_t cookie, struct mbuf *m)
+{
+	struct mbuf_refcount *ref;
+
+	ref = (struct mbuf_refcount *)m_tag_locate(m, cookie, MTAG_REFCOUNT,
+	    NULL);
+	KASSERT(ref != NULL, ("Cannot find refcount"));
+	ref->count++;
+}
+
+static void mbuf_release(uint32_t cookie, struct mbuf *m)
+{
+	struct mbuf_refcount *ref;
+
+	ref = (struct mbuf_refcount *)m_tag_locate(m, cookie, MTAG_REFCOUNT,
+	    NULL);
+	KASSERT(ref != NULL, ("Cannot find refcount"));
+	KASSERT(ref->count > 0, ("Invalid reference count"));
+
+	if(--ref->count == 0)
+		m_freem(m);
+}
+
+static void tag_free(struct m_tag *t)
+{
+	struct mbuf_refcount *ref = (struct mbuf_refcount *)t;
+
+	KASSERT(ref->count == 0, ("Free mbuf tag with pending refcnt"));
+}
+
 #define IPRINTK(fmt, args...) \
     printf("[XEN] " fmt, ##args)
 #ifdef INVARIANTS
@@ -778,7 +819,7 @@ disconnect_txq(struct netfront_txq *txq)
 static void
 destroy_txq(struct netfront_txq *txq)
 {
-
+	sglist_free(txq->segments);
 	free(txq->ring.sring, M_DEVBUF);
 	buf_ring_free(txq->br, M_DEVBUF);
 	taskqueue_drain_all(txq->tq);
@@ -826,6 +867,11 @@ setup_txqs(device_t dev, struct netfront_info *info,
 		for (i = 0; i <= NET_TX_RING_SIZE; i++) {
 			txq->mbufs[i] = (void *) ((u_long) i+1);
 			txq->grant_ref[i] = GRANT_REF_INVALID;
+			m_tag_setup(&txq->refcount_tag[i].tag,
+			    (unsigned long)txq, MTAG_REFCOUNT,
+			    sizeof(txq->refcount_tag[i]) -
+			    sizeof(txq->refcount_tag[i].tag));
+			txq->refcount_tag[i].tag.m_tag_free = &tag_free;
 		}
 		txq->mbufs[NET_TX_RING_SIZE] = (void *)0;
 
@@ -874,10 +920,18 @@ setup_txqs(device_t dev, struct netfront_info *info,
 			device_printf(dev, "xen_intr_alloc_and_bind_local_port failed\n");
 			goto fail_bind_port;
 		}
+
+		txq->segments = sglist_alloc(MAX_TX_REQ_FRAGS, M_WAITOK);
+		if (txq->segments == NULL) {
+			device_printf(dev, "failed to allocate sglist\n");
+			goto fail_sglist;
+		}
 	}
 
 	return (0);
 
+fail_sglist:
+	xen_intr_unbind(&txq->xen_intr_handle);
 fail_bind_port:
 	taskqueue_drain_all(txq->tq);
 fail_start_thread:
@@ -1041,7 +1095,7 @@ xn_release_tx_bufs(struct netfront_txq *txq)
 		if (txq->mbufs_cnt < 0) {
 			panic("%s: tx_chain_cnt must be >= 0", __func__);
 		}
-		m_free(m);
+		mbuf_release((unsigned long)txq, m);
 	}
 }
 
@@ -1311,7 +1365,7 @@ xn_txeof(struct netfront_txq *txq)
 			txq->mbufs[id] = NULL;
 			add_id_to_freelist(txq->mbufs, id);
 			txq->mbufs_cnt--;
-			m_free(m);
+			mbuf_release((unsigned long)txq, m);
 			/* Only mark the txq active if we've freed up at least one slot to try */
 			ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 		}
@@ -1507,22 +1561,6 @@ xn_get_responses(struct netfront_rxq *rxq,
 	return (err);
 }
 
-/**
- * \brief Count the number of fragments in an mbuf chain.
- *
- * Surprisingly, there isn't an M* macro for this.
- */
-static inline int
-xn_count_frags(struct mbuf *m)
-{
-	int nfrags;
-
-	for (nfrags = 0; m != NULL; m = m->m_next)
-		nfrags++;
-
-	return (nfrags);
-}
-
 /**
  * Given an mbuf chain, make sure we have enough room and then push
  * it onto the transmit ring.
@@ -1530,16 +1568,22 @@ xn_count_frags(struct mbuf *m)
 static int
 xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 {
-	struct mbuf *m;
 	struct netfront_info *np = txq->info;
 	struct ifnet *ifp = np->xn_ifp;
-	u_int nfrags;
-	int otherend_id;
+	u_int nfrags, i;
+	int otherend_id, rc;
+
+	sglist_reset(txq->segments);
+	rc = sglist_append_mbuf(txq->segments, m_head);
+	if (rc != 0) {
+		m_freem(m_head);
+		return (rc);
+	}
 
 	/**
 	 * Defragment the mbuf if necessary.
 	 */
-	nfrags = xn_count_frags(m_head);
+	nfrags = txq->segments->sg_nseg;
 
 	/*
 	 * Check to see whether this request is longer than netback
@@ -1551,6 +1595,8 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 	 * the Linux network stack.
 	 */
 	if (nfrags > np->maxfrags) {
+		struct mbuf *m;
+
 		m = m_defrag(m_head, M_NOWAIT);
 		if (!m) {
 			/*
@@ -1561,11 +1607,15 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 			return (EMSGSIZE);
 		}
 		m_head = m;
+		sglist_reset(txq->segments);
+		rc = sglist_append_mbuf(txq->segments, m_head);
+		if (rc != 0) {
+			m_freem(m_head);
+			return (rc);
+		}
+		nfrags = txq->segments->sg_nseg;
 	}
 
-	/* Determine how many fragments now exist */
-	nfrags = xn_count_frags(m_head);
-
 	/*
 	 * Check to see whether the defragmented packet has too many
 	 * segments for the Linux netback driver.
@@ -1604,14 +1654,15 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 	 * the fragment pointers. Stop when we run out
 	 * of fragments or hit the end of the mbuf chain.
 	 */
-	m = m_head;
 	otherend_id = xenbus_get_otherend_id(np->xbdev);
-	for (m = m_head; m; m = m->m_next) {
+	for (i = 0; i < nfrags; i++) {
 		netif_tx_request_t *tx;
 		uintptr_t id;
 		grant_ref_t ref;
 		u_long mfn; /* XXX Wrong type? */
+		struct sglist_seg *seg;
 
+		seg = &txq->segments->sg_segs[i];
 		tx = RING_GET_REQUEST(&txq->ring, txq->ring.req_prod_pvt);
 		id = get_id_from_freelist(txq->mbufs);
 		if (id == 0)
@@ -1621,17 +1672,22 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 		if (txq->mbufs_cnt > NET_TX_RING_SIZE)
 			panic("%s: tx_chain_cnt must be <= NET_TX_RING_SIZE\n",
 			    __func__);
-		txq->mbufs[id] = m;
+		if (i == 0)
+			m_tag_prepend(m_head, &txq->refcount_tag[id].tag);
+		mbuf_grab((unsigned long)txq, m_head);
+		txq->mbufs[id] = m_head;
 		tx->id = id;
 		ref = gnttab_claim_grant_reference(&txq->gref_head);
 		KASSERT((short)ref >= 0, ("Negative ref"));
-		mfn = virt_to_mfn(mtod(m, vm_offset_t));
+		mfn = atop(seg->ss_paddr);
 		gnttab_grant_foreign_access_ref(ref, otherend_id,
 		    mfn, GNTMAP_readonly);
 		tx->gref = txq->grant_ref[id] = ref;
-		tx->offset = mtod(m, vm_offset_t) & (PAGE_SIZE - 1);
+		tx->offset = seg->ss_paddr & PAGE_MASK;
+		KASSERT(tx->offset + seg->ss_len <= PAGE_SIZE,
+		    ("mbuf segment crosses a page boundary"));
 		tx->flags = 0;
-		if (m == m_head) {
+		if (i == 0) {
 			/*
 			 * The first fragment has the entire packet
 			 * size, subsequent fragments have just the
@@ -1640,7 +1696,7 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 			 * subtracting the sizes of the other
 			 * fragments.
 			 */
-			tx->size = m->m_pkthdr.len;
+			tx->size = m_head->m_pkthdr.len;
 
 			/*
 			 * The first fragment contains the checksum flags
@@ -1654,12 +1710,12 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 			 * so we have to test for CSUM_TSO
 			 * explicitly.
 			 */
-			if (m->m_pkthdr.csum_flags
+			if (m_head->m_pkthdr.csum_flags
 			    & (CSUM_DELAY_DATA | CSUM_TSO)) {
 				tx->flags |= (NETTXF_csum_blank
 				    | NETTXF_data_validated);
 			}
-			if (m->m_pkthdr.csum_flags & CSUM_TSO) {
+			if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
 				struct netif_extra_info *gso =
 					(struct netif_extra_info *)
 					RING_GET_REQUEST(&txq->ring,
@@ -1667,7 +1723,7 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 
 				tx->flags |= NETTXF_extra_info;
 
-				gso->u.gso.size = m->m_pkthdr.tso_segsz;
+				gso->u.gso.size = m_head->m_pkthdr.tso_segsz;
 				gso->u.gso.type =
 					XEN_NETIF_GSO_TYPE_TCPV4;
 				gso->u.gso.pad = 0;
@@ -1677,9 +1733,9 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 				gso->flags = 0;
 			}
 		} else {
-			tx->size = m->m_len;
+			tx->size = seg->ss_len;
 		}
-		if (m->m_next)
+		if (i != nfrags - 1)
 			tx->flags |= NETTXF_more_data;
 
 		txq->ring.req_prod_pvt++;



^ permalink raw reply related	[flat|nested] 33+ messages in thread

[parent not found: <CAKhsbWY5=vENgwgq3NV44KSZQgpOPY=33CMSZo=jweAcRDjBwg@mail.gmail.com>]

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
       [not found]                     ` <CAKhsbWY5=vENgwgq3NV44KSZQgpOPY=33CMSZo=jweAcRDjBwg@mail.gmail.com>
@ 2021-12-29  8:32                       ` Roger Pau Monné
  2021-12-29  9:13                         ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-29  8:32 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

Adding xen-devel back.

On Wed, Dec 29, 2021 at 01:44:18AM +0800, G.R. wrote:
> On Tue, Dec 28, 2021 at 3:05 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Sun, Dec 26, 2021 at 02:06:55AM +0800, G.R. wrote:
> > > > > Thanks. I've raised this on freensd-net for advice [0]. IMO netfront
> > > > > shouldn't receive an mbuf that crosses a page boundary, but if that's
> > > > > indeed a legit mbuf I will figure out the best way to handle it.
> > > > >
> > > > > I have a clumsy patch (below) that might solve this, if you want to
> > > > > give it a try.
> > > >
> > > > Applied the patch and it worked like a charm!
> > > > Thank you so much for your quick help!
> > > > Wish you a wonderful holiday!
> > >
> > > I may have said too quickly...
> > > With the patch I can attach the iscsi disk and neither the dom0 nor
> > > the NAS domU complains this time.
> > > But when I attempt to mount the attached disk it reports I/O errors randomly.
> > > By randomly I mean different disks behave differently...
> > > I don't see any error logs from kernels this time.
> > > (most of the iscsi disks are NTFS FS and mounted through the user mode
> > > fuse library)
> > > But since I have a local backup copy of the image, I can confirm that
> > > mounting that backup image does not result in any I/O error.
> > > Looks like something is still broken here...
> >
> > Indeed. That patch was likely too simple, and didn't properly handle
> > the split of mbuf data buffers.
> >
> > I have another version based on using sglist, which I think it's also
> > a worthwhile change for netfront. Can you please give it a try? I've
> > done a very simple test and seems fine, but you certainly have more
> > interesting cases.
> >
> > You will have to apply it on top of a clean tree, without any of the
> > other patches applied.
> 
> Unfortunately this new version is even worse.
> It not only does not fix the known issue on iSCSI, but also creating
> regression on NFS.
> The regression on NFS is kind of random that it takes a
> non-deterministic time to show up.
> Here is a stack trace for reference:
> db:0:kdb.enter.default>  bt
> Tracing pid 1696 tid 100622 td 0xfffff800883d5740
> kdb_enter() at kdb_enter+0x37/frame 0xfffffe009f80d900
> vpanic() at vpanic+0x197/frame 0xfffffe009f80d950
> panic() at panic+0x43/frame 0xfffffe009f80d9b0
> xn_txq_mq_start_locked() at xn_txq_mq_start_locked+0x5bc/frame
> 0xfffffe009f80da50

I think this is hitting a KASSERT, could you paste the text printed as
part of the panic (not just he backtrace)?

Sorry this is taking a bit of time to solve.

Thanks!


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-29  8:32                       ` Roger Pau Monné
@ 2021-12-29  9:13                         ` G.R.
  2021-12-29 10:27                           ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-29  9:13 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

>
> I think this is hitting a KASSERT, could you paste the text printed as
> part of the panic (not just he backtrace)?
>
> Sorry this is taking a bit of time to solve.
>
> Thanks!
>
Sorry that I didn't make it clear in the first place.
It is the same cross boundary assertion.

Also sorry about the email format if it mess up in your side. I am typing
in the Gmail app and don't find a way to switch to plain text.

>

[-- Attachment #2: Type: text/html, Size: 866 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-29  9:13                         ` G.R.
@ 2021-12-29 10:27                           ` Roger Pau Monné
  2021-12-29 19:07                             ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-29 10:27 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Wed, Dec 29, 2021 at 05:13:00PM +0800, G.R. wrote:
> >
> > I think this is hitting a KASSERT, could you paste the text printed as
> > part of the panic (not just he backtrace)?
> >
> > Sorry this is taking a bit of time to solve.
> >
> > Thanks!
> >
> Sorry that I didn't make it clear in the first place.
> It is the same cross boundary assertion.

I see. After looking at the code it seems like sglist will coalesce
contiguous physical ranges without taking page boundaries into
account, which is not suitable for our purpose here. I guess I will
either have to modify sglist, or switch to using bus_dma. The main
problem with using bus_dma is that it will require bigger changes to
netfront I think.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-29 10:27                           ` Roger Pau Monné
@ 2021-12-29 19:07                             ` Roger Pau Monné
  2021-12-30 15:12                               ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-29 19:07 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Wed, Dec 29, 2021 at 11:27:50AM +0100, Roger Pau Monné wrote:
> On Wed, Dec 29, 2021 at 05:13:00PM +0800, G.R. wrote:
> > >
> > > I think this is hitting a KASSERT, could you paste the text printed as
> > > part of the panic (not just he backtrace)?
> > >
> > > Sorry this is taking a bit of time to solve.
> > >
> > > Thanks!
> > >
> > Sorry that I didn't make it clear in the first place.
> > It is the same cross boundary assertion.
> 
> I see. After looking at the code it seems like sglist will coalesce
> contiguous physical ranges without taking page boundaries into
> account, which is not suitable for our purpose here. I guess I will
> either have to modify sglist, or switch to using bus_dma. The main
> problem with using bus_dma is that it will require bigger changes to
> netfront I think.

I have a crappy patch to use bus_dma. It's not yet ready for upstream
but you might want to give it a try to see if it solves the cross page
boundary issues.

Thanks, Roger.
---
diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c
index 8dba5a8dc6d5..693ef25c8783 100644
--- a/sys/dev/xen/netfront/netfront.c
+++ b/sys/dev/xen/netfront/netfront.c
@@ -71,6 +71,8 @@ __FBSDID("$FreeBSD$");
 #include <xen/interface/io/netif.h>
 #include <xen/xenbus/xenbusvar.h>
 
+#include <machine/bus.h>
+
 #include "xenbus_if.h"
 
 /* Features supported by all backends.  TSO and LRO can be negotiated */
@@ -199,6 +201,12 @@ struct netfront_txq {
 	struct taskqueue 	*tq;
 	struct task       	defrtask;
 
+	bus_dmamap_t		dma_map;
+	struct mbuf_refcount {
+		struct m_tag 	tag;
+		u_int 		count;
+	}			refcount_tag[NET_TX_RING_SIZE + 1];
+
 	bool			full;
 };
 
@@ -221,6 +229,8 @@ struct netfront_info {
 
 	struct ifmedia		sc_media;
 
+	bus_dma_tag_t		dma_tag;
+
 	bool			xn_reset;
 };
 
@@ -301,6 +311,39 @@ xn_get_rx_ref(struct netfront_rxq *rxq, RING_IDX ri)
 	return (ref);
 }
 
+#define MTAG_COOKIE 1218492000
+#define MTAG_REFCOUNT 0
+
+static void mbuf_grab(struct mbuf *m)
+{
+	struct mbuf_refcount *ref;
+
+	ref = (struct mbuf_refcount *)m_tag_locate(m, MTAG_COOKIE,
+	    MTAG_REFCOUNT, NULL);
+	KASSERT(ref != NULL, ("Cannot find refcount"));
+	ref->count++;
+}
+
+static void mbuf_release(struct mbuf *m)
+{
+	struct mbuf_refcount *ref;
+
+	ref = (struct mbuf_refcount *)m_tag_locate(m, MTAG_COOKIE,
+	    MTAG_REFCOUNT, NULL);
+	KASSERT(ref != NULL, ("Cannot find refcount"));
+	KASSERT(ref->count > 0, ("Invalid reference count"));
+
+	if (--ref->count == 0)
+		m_freem(m);
+}
+
+static void tag_free(struct m_tag *t)
+{
+	struct mbuf_refcount *ref = (struct mbuf_refcount *)t;
+
+	KASSERT(ref->count == 0, ("Free mbuf tag with pending refcnt"));
+}
+
 #define IPRINTK(fmt, args...) \
     printf("[XEN] " fmt, ##args)
 #ifdef INVARIANTS
@@ -783,6 +826,7 @@ destroy_txq(struct netfront_txq *txq)
 	buf_ring_free(txq->br, M_DEVBUF);
 	taskqueue_drain_all(txq->tq);
 	taskqueue_free(txq->tq);
+	bus_dmamap_destroy(txq->info->dma_tag, txq->dma_map);
 }
 
 static void
@@ -826,6 +870,11 @@ setup_txqs(device_t dev, struct netfront_info *info,
 		for (i = 0; i <= NET_TX_RING_SIZE; i++) {
 			txq->mbufs[i] = (void *) ((u_long) i+1);
 			txq->grant_ref[i] = GRANT_REF_INVALID;
+			m_tag_setup(&txq->refcount_tag[i].tag,
+			    MTAG_COOKIE, MTAG_REFCOUNT,
+			    sizeof(txq->refcount_tag[i]) -
+			    sizeof(txq->refcount_tag[i].tag));
+			txq->refcount_tag[i].tag.m_tag_free = &tag_free;
 		}
 		txq->mbufs[NET_TX_RING_SIZE] = (void *)0;
 
@@ -874,10 +923,18 @@ setup_txqs(device_t dev, struct netfront_info *info,
 			device_printf(dev, "xen_intr_alloc_and_bind_local_port failed\n");
 			goto fail_bind_port;
 		}
+
+		error = bus_dmamap_create(info->dma_tag, 0, &txq->dma_map);
+		if (error != 0) {
+			device_printf(dev, "failed to create dma map\n");
+			goto fail_dma_map;
+		}
 	}
 
 	return (0);
 
+fail_dma_map:
+	xen_intr_unbind(&txq->xen_intr_handle);
 fail_bind_port:
 	taskqueue_drain_all(txq->tq);
 fail_start_thread:
@@ -1041,7 +1098,7 @@ xn_release_tx_bufs(struct netfront_txq *txq)
 		if (txq->mbufs_cnt < 0) {
 			panic("%s: tx_chain_cnt must be >= 0", __func__);
 		}
-		m_free(m);
+		mbuf_release(m);
 	}
 }
 
@@ -1311,7 +1368,7 @@ xn_txeof(struct netfront_txq *txq)
 			txq->mbufs[id] = NULL;
 			add_id_to_freelist(txq->mbufs, id);
 			txq->mbufs_cnt--;
-			m_free(m);
+			mbuf_release(m);
 			/* Only mark the txq active if we've freed up at least one slot to try */
 			ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 		}
@@ -1530,27 +1587,18 @@ xn_count_frags(struct mbuf *m)
 static int
 xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 {
-	struct mbuf *m;
 	struct netfront_info *np = txq->info;
 	struct ifnet *ifp = np->xn_ifp;
-	u_int nfrags;
-	int otherend_id;
+	int otherend_id, error, nfrags;
+	unsigned int i;
+	bus_dma_segment_t segs[MAX_TX_REQ_FRAGS];
 
-	/**
-	 * Defragment the mbuf if necessary.
-	 */
-	nfrags = xn_count_frags(m_head);
+	error = bus_dmamap_load_mbuf_sg(np->dma_tag, txq->dma_map, m_head,
+	    segs, &nfrags, 0);
+	if (error == EFBIG || nfrags > np->maxfrags) {
+		struct mbuf *m;
 
-	/*
-	 * Check to see whether this request is longer than netback
-	 * can handle, and try to defrag it.
-	 */
-	/**
-	 * It is a bit lame, but the netback driver in Linux can't
-	 * deal with nfrags > MAX_TX_REQ_FRAGS, which is a quirk of
-	 * the Linux network stack.
-	 */
-	if (nfrags > np->maxfrags) {
+		bus_dmamap_unload(np->dma_tag, txq->dma_map);
 		m = m_defrag(m_head, M_NOWAIT);
 		if (!m) {
 			/*
@@ -1561,15 +1609,18 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 			return (EMSGSIZE);
 		}
 		m_head = m;
+		error = bus_dmamap_load_mbuf_sg(np->dma_tag, txq->dma_map,
+		    m_head, segs, &nfrags, 0);
+		if (error != 0 || nfrags > np->maxfrags) {
+			bus_dmamap_unload(np->dma_tag, txq->dma_map);
+			m_freem(m_head);
+			return (error ?: EFBIG);
+		}
+	} else if (error != 0) {
+		m_freem(m_head);
+		return (error);
 	}
 
-	/* Determine how many fragments now exist */
-	nfrags = xn_count_frags(m_head);
-
-	/*
-	 * Check to see whether the defragmented packet has too many
-	 * segments for the Linux netback driver.
-	 */
 	/**
 	 * The FreeBSD TCP stack, with TSO enabled, can produce a chain
 	 * of mbufs longer than Linux can handle.  Make sure we don't
@@ -1604,9 +1655,8 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 	 * the fragment pointers. Stop when we run out
 	 * of fragments or hit the end of the mbuf chain.
 	 */
-	m = m_head;
 	otherend_id = xenbus_get_otherend_id(np->xbdev);
-	for (m = m_head; m; m = m->m_next) {
+	for (i = 0; i < nfrags; i++) {
 		netif_tx_request_t *tx;
 		uintptr_t id;
 		grant_ref_t ref;
@@ -1621,17 +1671,22 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 		if (txq->mbufs_cnt > NET_TX_RING_SIZE)
 			panic("%s: tx_chain_cnt must be <= NET_TX_RING_SIZE\n",
 			    __func__);
-		txq->mbufs[id] = m;
+		if (i == 0)
+			m_tag_prepend(m_head, &txq->refcount_tag[id].tag);
+		mbuf_grab(m_head);
+		txq->mbufs[id] = m_head;
 		tx->id = id;
 		ref = gnttab_claim_grant_reference(&txq->gref_head);
 		KASSERT((short)ref >= 0, ("Negative ref"));
-		mfn = virt_to_mfn(mtod(m, vm_offset_t));
+		mfn = atop(segs[i].ds_addr);
 		gnttab_grant_foreign_access_ref(ref, otherend_id,
 		    mfn, GNTMAP_readonly);
 		tx->gref = txq->grant_ref[id] = ref;
-		tx->offset = mtod(m, vm_offset_t) & (PAGE_SIZE - 1);
+		tx->offset = segs[i].ds_addr & PAGE_MASK;
+		KASSERT(tx->offset + segs[i].ds_len <= PAGE_SIZE,
+		    ("mbuf segment crosses a page boundary"));
 		tx->flags = 0;
-		if (m == m_head) {
+		if (i == 0) {
 			/*
 			 * The first fragment has the entire packet
 			 * size, subsequent fragments have just the
@@ -1640,7 +1695,7 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 			 * subtracting the sizes of the other
 			 * fragments.
 			 */
-			tx->size = m->m_pkthdr.len;
+			tx->size = m_head->m_pkthdr.len;
 
 			/*
 			 * The first fragment contains the checksum flags
@@ -1654,12 +1709,12 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 			 * so we have to test for CSUM_TSO
 			 * explicitly.
 			 */
-			if (m->m_pkthdr.csum_flags
+			if (m_head->m_pkthdr.csum_flags
 			    & (CSUM_DELAY_DATA | CSUM_TSO)) {
 				tx->flags |= (NETTXF_csum_blank
 				    | NETTXF_data_validated);
 			}
-			if (m->m_pkthdr.csum_flags & CSUM_TSO) {
+			if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
 				struct netif_extra_info *gso =
 					(struct netif_extra_info *)
 					RING_GET_REQUEST(&txq->ring,
@@ -1667,7 +1722,7 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 
 				tx->flags |= NETTXF_extra_info;
 
-				gso->u.gso.size = m->m_pkthdr.tso_segsz;
+				gso->u.gso.size = m_head->m_pkthdr.tso_segsz;
 				gso->u.gso.type =
 					XEN_NETIF_GSO_TYPE_TCPV4;
 				gso->u.gso.pad = 0;
@@ -1677,13 +1732,14 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 				gso->flags = 0;
 			}
 		} else {
-			tx->size = m->m_len;
+			tx->size = segs[i].ds_len;
 		}
-		if (m->m_next)
+		if (i != nfrags - 1)
 			tx->flags |= NETTXF_more_data;
 
 		txq->ring.req_prod_pvt++;
 	}
+	bus_dmamap_unload(np->dma_tag, txq->dma_map);
 	BPF_MTAP(ifp, m_head);
 
 	if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
@@ -2244,7 +2300,20 @@ create_netdev(device_t dev)
     	ether_ifattach(ifp, np->mac);
 	netfront_carrier_off(np);
 
-	return (0);
+	err = bus_dma_tag_create(
+	    bus_get_dma_tag(dev),		/* parent */
+	    1, PAGE_SIZE,			/* algnmnt, boundary */
+	    BUS_SPACE_MAXADDR,			/* lowaddr */
+	    BUS_SPACE_MAXADDR,			/* highaddr */
+	    NULL, NULL,				/* filter, filterarg */
+	    PAGE_SIZE * MAX_TX_REQ_FRAGS,	/* max request size */
+	    MAX_TX_REQ_FRAGS,			/* max segments */
+	    PAGE_SIZE,				/* maxsegsize */
+	    BUS_DMA_ALLOCNOW,			/* flags */
+	    NULL, NULL,				/* lockfunc, lockarg */
+	    &np->dma_tag);
+
+	return (err);
 
 error:
 	KASSERT(err != 0, ("Error path with no error code specified"));
@@ -2277,6 +2346,7 @@ netif_free(struct netfront_info *np)
 	if_free(np->xn_ifp);
 	np->xn_ifp = NULL;
 	ifmedia_removeall(&np->sc_media);
+	bus_dma_tag_destroy(np->dma_tag);
 }
 
 static void



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-29 19:07                             ` Roger Pau Monné
@ 2021-12-30 15:12                               ` G.R.
  2021-12-30 18:51                                 ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-30 15:12 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Thu, Dec 30, 2021 at 3:07 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Dec 29, 2021 at 11:27:50AM +0100, Roger Pau Monné wrote:
> > On Wed, Dec 29, 2021 at 05:13:00PM +0800, G.R. wrote:
> > > >
> > > > I think this is hitting a KASSERT, could you paste the text printed as
> > > > part of the panic (not just he backtrace)?
> > > >
> > > > Sorry this is taking a bit of time to solve.
> > > >
> > > > Thanks!
> > > >
> > > Sorry that I didn't make it clear in the first place.
> > > It is the same cross boundary assertion.
> >
> > I see. After looking at the code it seems like sglist will coalesce
> > contiguous physical ranges without taking page boundaries into
> > account, which is not suitable for our purpose here. I guess I will
> > either have to modify sglist, or switch to using bus_dma. The main
> > problem with using bus_dma is that it will require bigger changes to
> > netfront I think.
>
> I have a crappy patch to use bus_dma. It's not yet ready for upstream
> but you might want to give it a try to see if it solves the cross page
> boundary issues.
>
I think this version is better.
It fixed the mbuf cross boundary issue and allowed me to boot from one
disk image successfully.
But seems like this patch is not stable enough yet and has its own
issue -- memory is not properly released?
The stack trace is likely not useful, but anyway...

Context:
pmap_growkernel: no memory to grow kernel

<118>Dec 30 22:55:47 nas kernel[2164]: Last message 'pid 1066
(python3.9)' repeated 1 times, suppressed by syslog-ng on nas.rglab.us
<118>Dec 30 22:55:47 nas kernel: pid 2086 (python3.9), jid 0, uid 0,
was killed: out of swap space
panic: pmap_growkernel: no memory to grow kernel
cpuid = 1
time = 1640876153
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe009b971210
vpanic() at vpanic+0x17b/frame 0xfffffe009b971260
panic() at panic+0x43/frame 0xfffffe009b9712c0
pmap_growkernel() at pmap_growkernel+0x2f1/frame 0xfffffe009b971300
vm_map_insert() at vm_map_insert+0x27b/frame 0xfffffe009b971390
vm_map_find() at vm_map_find+0x5ed/frame 0xfffffe009b971470
kva_import() at kva_import+0x3c/frame 0xfffffe009b9714b0
vmem_try_fetch() at vmem_try_fetch+0xde/frame 0xfffffe009b971500
vmem_xalloc() at vmem_xalloc+0x4db/frame 0xfffffe009b971580
kva_import_domain() at kva_import_domain+0x36/frame 0xfffffe009b9715b0
vmem_try_fetch() at vmem_try_fetch+0xde/frame 0xfffffe009b971600
vmem_xalloc() at vmem_xalloc+0x4db/frame 0xfffffe009b971680
vmem_alloc() at vmem_alloc+0x8a/frame 0xfffffe009b9716d0
kmem_malloc_domainset() at kmem_malloc_domainset+0x92/frame 0xfffffe009b971740
keg_alloc_slab() at keg_alloc_slab+0xfa/frame 0xfffffe009b9717a0
keg_fetch_slab() at keg_fetch_slab+0xfe/frame 0xfffffe009b971830
zone_fetch_slab() at zone_fetch_slab+0x61/frame 0xfffffe009b971870
zone_import() at zone_import+0x75/frame 0xfffffe009b9718f0
zone_alloc_item() at zone_alloc_item+0x56/frame 0xfffffe009b971930
abd_borrow_buf() at abd_borrow_buf+0x1f/frame 0xfffffe009b971950
vdev_geom_io_start() at vdev_geom_io_start+0x189/frame 0xfffffe009b971980
zio_vdev_io_start() at zio_vdev_io_start+0x1e4/frame 0xfffffe009b9719d0
zio_nowait() at zio_nowait+0x11a/frame 0xfffffe009b971a30
vdev_queue_io_done() at vdev_queue_io_done+0x1b8/frame 0xfffffe009b971a90
zio_vdev_io_done() at zio_vdev_io_done+0xe3/frame 0xfffffe009b971ad0
zio_execute() at zio_execute+0x6a/frame 0xfffffe009b971b20
taskqueue_run_locked() at taskqueue_run_locked+0x168/frame 0xfffffe009b971b80
taskqueue_thread_loop() at taskqueue_thread_loop+0x94/frame 0xfffffe009b971bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe009b971bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009b971bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-30 15:12                               ` G.R.
@ 2021-12-30 18:51                                 ` Roger Pau Monné
  2021-12-31 14:47                                   ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-30 18:51 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Thu, Dec 30, 2021 at 11:12:57PM +0800, G.R. wrote:
> On Thu, Dec 30, 2021 at 3:07 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Dec 29, 2021 at 11:27:50AM +0100, Roger Pau Monné wrote:
> > > On Wed, Dec 29, 2021 at 05:13:00PM +0800, G.R. wrote:
> > > > >
> > > > > I think this is hitting a KASSERT, could you paste the text printed as
> > > > > part of the panic (not just he backtrace)?
> > > > >
> > > > > Sorry this is taking a bit of time to solve.
> > > > >
> > > > > Thanks!
> > > > >
> > > > Sorry that I didn't make it clear in the first place.
> > > > It is the same cross boundary assertion.
> > >
> > > I see. After looking at the code it seems like sglist will coalesce
> > > contiguous physical ranges without taking page boundaries into
> > > account, which is not suitable for our purpose here. I guess I will
> > > either have to modify sglist, or switch to using bus_dma. The main
> > > problem with using bus_dma is that it will require bigger changes to
> > > netfront I think.
> >
> > I have a crappy patch to use bus_dma. It's not yet ready for upstream
> > but you might want to give it a try to see if it solves the cross page
> > boundary issues.
> >
> I think this version is better.

Thanks for all the testing.

> It fixed the mbuf cross boundary issue and allowed me to boot from one
> disk image successfully.

It's good to know it seems to handle splitting mbufs fragments at page
boundaries correctly.

> But seems like this patch is not stable enough yet and has its own
> issue -- memory is not properly released?

I know. I've been working on improving it this morning and I'm
attaching an updated version below.

Thanks, Roger.
---
diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c
index 8dba5a8dc6d5..69528cc39b94 100644
--- a/sys/dev/xen/netfront/netfront.c
+++ b/sys/dev/xen/netfront/netfront.c
@@ -71,6 +71,8 @@ __FBSDID("$FreeBSD$");
 #include <xen/interface/io/netif.h>
 #include <xen/xenbus/xenbusvar.h>
 
+#include <machine/bus.h>
+
 #include "xenbus_if.h"
 
 /* Features supported by all backends.  TSO and LRO can be negotiated */
@@ -199,6 +201,17 @@ struct netfront_txq {
 	struct taskqueue 	*tq;
 	struct task       	defrtask;
 
+	bus_dma_segment_t	segs[MAX_TX_REQ_FRAGS];
+	struct mbuf_xennet {
+		struct m_tag 	tag;
+		bus_dma_tag_t	dma_tag;
+		bus_dmamap_t	dma_map;
+		struct netfront_txq *txq;
+		SLIST_ENTRY(mbuf_xennet) next;
+		u_int 		count;
+	}			xennet_tag[NET_TX_RING_SIZE + 1];
+	SLIST_HEAD(, mbuf_xennet) tags;
+
 	bool			full;
 };
 
@@ -221,6 +234,8 @@ struct netfront_info {
 
 	struct ifmedia		sc_media;
 
+	bus_dma_tag_t		dma_tag;
+
 	bool			xn_reset;
 };
 
@@ -301,6 +316,42 @@ xn_get_rx_ref(struct netfront_rxq *rxq, RING_IDX ri)
 	return (ref);
 }
 
+#define MTAG_COOKIE 1218492000
+#define MTAG_XENNET 0
+
+static void mbuf_grab(struct mbuf *m)
+{
+	struct mbuf_xennet *ref;
+
+	ref = (struct mbuf_xennet *)m_tag_locate(m, MTAG_COOKIE,
+	    MTAG_XENNET, NULL);
+	KASSERT(ref != NULL, ("Cannot find refcount"));
+	ref->count++;
+}
+
+static void mbuf_release(struct mbuf *m)
+{
+	struct mbuf_xennet *ref;
+
+	ref = (struct mbuf_xennet *)m_tag_locate(m, MTAG_COOKIE,
+	    MTAG_XENNET, NULL);
+	KASSERT(ref != NULL, ("Cannot find refcount"));
+	KASSERT(ref->count > 0, ("Invalid reference count"));
+
+	if (--ref->count == 0)
+		m_freem(m);
+}
+
+static void tag_free(struct m_tag *t)
+{
+	struct mbuf_xennet *ref = (struct mbuf_xennet *)t;
+
+	KASSERT(ref->count == 0, ("Free mbuf tag with pending refcnt"));
+	bus_dmamap_sync(ref->dma_tag, ref->dma_map, BUS_DMASYNC_POSTWRITE);
+	bus_dmamap_destroy(ref->dma_tag, ref->dma_map);
+	SLIST_INSERT_HEAD(&ref->txq->tags, ref, next);
+}
+
 #define IPRINTK(fmt, args...) \
     printf("[XEN] " fmt, ##args)
 #ifdef INVARIANTS
@@ -778,11 +829,18 @@ disconnect_txq(struct netfront_txq *txq)
 static void
 destroy_txq(struct netfront_txq *txq)
 {
+	unsigned int i;
 
 	free(txq->ring.sring, M_DEVBUF);
 	buf_ring_free(txq->br, M_DEVBUF);
 	taskqueue_drain_all(txq->tq);
 	taskqueue_free(txq->tq);
+
+	for (i = 0; i <= NET_TX_RING_SIZE; i++) {
+		bus_dmamap_destroy(txq->info->dma_tag,
+		    txq->xennet_tag[i].dma_map);
+		txq->xennet_tag[i].dma_map = NULL;
+	}
 }
 
 static void
@@ -822,10 +880,27 @@ setup_txqs(device_t dev, struct netfront_info *info,
 
 		mtx_init(&txq->lock, txq->name, "netfront transmit lock",
 		    MTX_DEF);
+		SLIST_INIT(&txq->tags);
 
 		for (i = 0; i <= NET_TX_RING_SIZE; i++) {
 			txq->mbufs[i] = (void *) ((u_long) i+1);
 			txq->grant_ref[i] = GRANT_REF_INVALID;
+			txq->xennet_tag[i].txq = txq;
+			txq->xennet_tag[i].dma_tag = info->dma_tag;
+			error = bus_dmamap_create(info->dma_tag, 0,
+			    &txq->xennet_tag[i].dma_map);
+			if (error != 0) {
+				device_printf(dev,
+				    "failed to allocate dma map\n");
+				goto fail;
+			}
+			m_tag_setup(&txq->xennet_tag[i].tag,
+			    MTAG_COOKIE, MTAG_XENNET,
+			    sizeof(txq->xennet_tag[i]) -
+			    sizeof(txq->xennet_tag[i].tag));
+			txq->xennet_tag[i].tag.m_tag_free = &tag_free;
+			SLIST_INSERT_HEAD(&txq->tags, &txq->xennet_tag[i],
+			    next);
 		}
 		txq->mbufs[NET_TX_RING_SIZE] = (void *)0;
 
@@ -1041,7 +1116,7 @@ xn_release_tx_bufs(struct netfront_txq *txq)
 		if (txq->mbufs_cnt < 0) {
 			panic("%s: tx_chain_cnt must be >= 0", __func__);
 		}
-		m_free(m);
+		mbuf_release(m);
 	}
 }
 
@@ -1311,7 +1386,7 @@ xn_txeof(struct netfront_txq *txq)
 			txq->mbufs[id] = NULL;
 			add_id_to_freelist(txq->mbufs, id);
 			txq->mbufs_cnt--;
-			m_free(m);
+			mbuf_release(m);
 			/* Only mark the txq active if we've freed up at least one slot to try */
 			ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 		}
@@ -1530,46 +1605,51 @@ xn_count_frags(struct mbuf *m)
 static int
 xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 {
-	struct mbuf *m;
 	struct netfront_info *np = txq->info;
 	struct ifnet *ifp = np->xn_ifp;
-	u_int nfrags;
-	int otherend_id;
+	int otherend_id, error, nfrags;
+	bus_dma_segment_t *segs;
+	struct mbuf_xennet *tag;
+	bus_dmamap_t map;
+	unsigned int i;
 
-	/**
-	 * Defragment the mbuf if necessary.
-	 */
-	nfrags = xn_count_frags(m_head);
+	segs = txq->segs;
+	KASSERT(!SLIST_EMPTY(&txq->tags), ("no tags available"));
+	tag = SLIST_FIRST(&txq->tags);
+	SLIST_REMOVE_HEAD(&txq->tags, next);
+	KASSERT(tag->count == 0, ("tag already in-use"));
+	map = tag->dma_map;
+	error = bus_dmamap_load_mbuf_sg(np->dma_tag, map, m_head, segs,
+	    &nfrags, 0);
+	if (error == EFBIG || nfrags > np->maxfrags) {
+		struct mbuf *m;
 
-	/*
-	 * Check to see whether this request is longer than netback
-	 * can handle, and try to defrag it.
-	 */
-	/**
-	 * It is a bit lame, but the netback driver in Linux can't
-	 * deal with nfrags > MAX_TX_REQ_FRAGS, which is a quirk of
-	 * the Linux network stack.
-	 */
-	if (nfrags > np->maxfrags) {
+		bus_dmamap_unload(np->dma_tag, map);
 		m = m_defrag(m_head, M_NOWAIT);
 		if (!m) {
 			/*
 			 * Defrag failed, so free the mbuf and
 			 * therefore drop the packet.
 			 */
+			SLIST_INSERT_HEAD(&txq->tags, tag, next);
 			m_freem(m_head);
 			return (EMSGSIZE);
 		}
 		m_head = m;
+		error = bus_dmamap_load_mbuf_sg(np->dma_tag, map, m_head, segs,
+		    &nfrags, 0);
+		if (error != 0 || nfrags > np->maxfrags) {
+			bus_dmamap_unload(np->dma_tag, map);
+			SLIST_INSERT_HEAD(&txq->tags, tag, next);
+			m_freem(m_head);
+			return (error ?: EFBIG);
+		}
+	} else if (error != 0) {
+		SLIST_INSERT_HEAD(&txq->tags, tag, next);
+		m_freem(m_head);
+		return (error);
 	}
 
-	/* Determine how many fragments now exist */
-	nfrags = xn_count_frags(m_head);
-
-	/*
-	 * Check to see whether the defragmented packet has too many
-	 * segments for the Linux netback driver.
-	 */
 	/**
 	 * The FreeBSD TCP stack, with TSO enabled, can produce a chain
 	 * of mbufs longer than Linux can handle.  Make sure we don't
@@ -1583,6 +1663,8 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 		       "won't be able to handle it, dropping\n",
 		       __func__, nfrags, MAX_TX_REQ_FRAGS);
 #endif
+		SLIST_INSERT_HEAD(&txq->tags, tag, next);
+		bus_dmamap_unload(np->dma_tag, map);
 		m_freem(m_head);
 		return (EMSGSIZE);
 	}
@@ -1604,9 +1686,9 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 	 * the fragment pointers. Stop when we run out
 	 * of fragments or hit the end of the mbuf chain.
 	 */
-	m = m_head;
 	otherend_id = xenbus_get_otherend_id(np->xbdev);
-	for (m = m_head; m; m = m->m_next) {
+	m_tag_prepend(m_head, &tag->tag);
+	for (i = 0; i < nfrags; i++) {
 		netif_tx_request_t *tx;
 		uintptr_t id;
 		grant_ref_t ref;
@@ -1621,17 +1703,20 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 		if (txq->mbufs_cnt > NET_TX_RING_SIZE)
 			panic("%s: tx_chain_cnt must be <= NET_TX_RING_SIZE\n",
 			    __func__);
-		txq->mbufs[id] = m;
+		mbuf_grab(m_head);
+		txq->mbufs[id] = m_head;
 		tx->id = id;
 		ref = gnttab_claim_grant_reference(&txq->gref_head);
 		KASSERT((short)ref >= 0, ("Negative ref"));
-		mfn = virt_to_mfn(mtod(m, vm_offset_t));
+		mfn = atop(segs[i].ds_addr);
 		gnttab_grant_foreign_access_ref(ref, otherend_id,
 		    mfn, GNTMAP_readonly);
 		tx->gref = txq->grant_ref[id] = ref;
-		tx->offset = mtod(m, vm_offset_t) & (PAGE_SIZE - 1);
+		tx->offset = segs[i].ds_addr & PAGE_MASK;
+		KASSERT(tx->offset + segs[i].ds_len <= PAGE_SIZE,
+		    ("mbuf segment crosses a page boundary"));
 		tx->flags = 0;
-		if (m == m_head) {
+		if (i == 0) {
 			/*
 			 * The first fragment has the entire packet
 			 * size, subsequent fragments have just the
@@ -1640,7 +1725,7 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 			 * subtracting the sizes of the other
 			 * fragments.
 			 */
-			tx->size = m->m_pkthdr.len;
+			tx->size = m_head->m_pkthdr.len;
 
 			/*
 			 * The first fragment contains the checksum flags
@@ -1654,12 +1739,12 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 			 * so we have to test for CSUM_TSO
 			 * explicitly.
 			 */
-			if (m->m_pkthdr.csum_flags
+			if (m_head->m_pkthdr.csum_flags
 			    & (CSUM_DELAY_DATA | CSUM_TSO)) {
 				tx->flags |= (NETTXF_csum_blank
 				    | NETTXF_data_validated);
 			}
-			if (m->m_pkthdr.csum_flags & CSUM_TSO) {
+			if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
 				struct netif_extra_info *gso =
 					(struct netif_extra_info *)
 					RING_GET_REQUEST(&txq->ring,
@@ -1667,7 +1752,7 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 
 				tx->flags |= NETTXF_extra_info;
 
-				gso->u.gso.size = m->m_pkthdr.tso_segsz;
+				gso->u.gso.size = m_head->m_pkthdr.tso_segsz;
 				gso->u.gso.type =
 					XEN_NETIF_GSO_TYPE_TCPV4;
 				gso->u.gso.pad = 0;
@@ -1677,13 +1762,14 @@ xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 				gso->flags = 0;
 			}
 		} else {
-			tx->size = m->m_len;
+			tx->size = segs[i].ds_len;
 		}
-		if (m->m_next)
+		if (i != nfrags - 1)
 			tx->flags |= NETTXF_more_data;
 
 		txq->ring.req_prod_pvt++;
 	}
+	bus_dmamap_sync(np->dma_tag, map, BUS_DMASYNC_PREWRITE);
 	BPF_MTAP(ifp, m_head);
 
 	if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
@@ -2244,7 +2330,20 @@ create_netdev(device_t dev)
     	ether_ifattach(ifp, np->mac);
 	netfront_carrier_off(np);
 
-	return (0);
+	err = bus_dma_tag_create(
+	    bus_get_dma_tag(dev),		/* parent */
+	    1, PAGE_SIZE,			/* algnmnt, boundary */
+	    BUS_SPACE_MAXADDR,			/* lowaddr */
+	    BUS_SPACE_MAXADDR,			/* highaddr */
+	    NULL, NULL,				/* filter, filterarg */
+	    PAGE_SIZE * MAX_TX_REQ_FRAGS,	/* max request size */
+	    MAX_TX_REQ_FRAGS,			/* max segments */
+	    PAGE_SIZE,				/* maxsegsize */
+	    BUS_DMA_ALLOCNOW,			/* flags */
+	    NULL, NULL,				/* lockfunc, lockarg */
+	    &np->dma_tag);
+
+	return (err);
 
 error:
 	KASSERT(err != 0, ("Error path with no error code specified"));
@@ -2277,6 +2376,7 @@ netif_free(struct netfront_info *np)
 	if_free(np->xn_ifp);
 	np->xn_ifp = NULL;
 	ifmedia_removeall(&np->sc_media);
+	bus_dma_tag_destroy(np->dma_tag);
 }
 
 static void



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-30 18:51                                 ` Roger Pau Monné
@ 2021-12-31 14:47                                   ` G.R.
  2022-01-04 10:25                                     ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2021-12-31 14:47 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Fri, Dec 31, 2021 at 2:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Thu, Dec 30, 2021 at 11:12:57PM +0800, G.R. wrote:
> > On Thu, Dec 30, 2021 at 3:07 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Dec 29, 2021 at 11:27:50AM +0100, Roger Pau Monné wrote:
> > > > On Wed, Dec 29, 2021 at 05:13:00PM +0800, G.R. wrote:
> > > > > >
> > > > > > I think this is hitting a KASSERT, could you paste the text printed as
> > > > > > part of the panic (not just he backtrace)?
> > > > > >
> > > > > > Sorry this is taking a bit of time to solve.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > Sorry that I didn't make it clear in the first place.
> > > > > It is the same cross boundary assertion.
> > > >
> > > > I see. After looking at the code it seems like sglist will coalesce
> > > > contiguous physical ranges without taking page boundaries into
> > > > account, which is not suitable for our purpose here. I guess I will
> > > > either have to modify sglist, or switch to using bus_dma. The main
> > > > problem with using bus_dma is that it will require bigger changes to
> > > > netfront I think.
> > >
> > > I have a crappy patch to use bus_dma. It's not yet ready for upstream
> > > but you might want to give it a try to see if it solves the cross page
> > > boundary issues.
> > >
> > I think this version is better.
>
> Thanks for all the testing.
>
> > It fixed the mbuf cross boundary issue and allowed me to boot from one
> > disk image successfully.
>
> It's good to know it seems to handle splitting mbufs fragments at page
> boundaries correctly.
>
> > But seems like this patch is not stable enough yet and has its own
> > issue -- memory is not properly released?
>
> I know. I've been working on improving it this morning and I'm
> attaching an updated version below.
>
Good news.
With this  new patch, the NAS domU can serve iSCSI disk without OOM
panic, at least for a little while.
I'm going to keep it up and running for a while to see if it's stable over time.

BTW, an irrelevant question:
What's the current status of HVM domU on top of storage driver domain?
About 7 years ago, one user on the list was able to get this setup up
and running with your help (patch).[1]
When I attempted to reproduce a similar setup two years later, I
discovered that the patch was not submitted.
And even with that patch the setup cannot be reproduced successfully.
We spent some time debugging on the problem together[2], but didn't
bottom out the root cause at that time.
In case it's still broken and you still have the interest and time, I
can launch a separate thread on this topic and provide required
testing environment.

[1] https://lists.xenproject.org/archives/html/xen-users/2014-08/msg00003.html
[2] https://xen-users.narkive.com/9ihP0QG4/hvm-domu-on-storage-driver-domain

Thanks,
G.R.

> Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-31 14:47                                   ` G.R.
@ 2022-01-04 10:25                                     ` Roger Pau Monné
  2022-01-04 16:05                                       ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2022-01-04 10:25 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Fri, Dec 31, 2021 at 10:47:57PM +0800, G.R. wrote:
> On Fri, Dec 31, 2021 at 2:52 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Thu, Dec 30, 2021 at 11:12:57PM +0800, G.R. wrote:
> > > On Thu, Dec 30, 2021 at 3:07 AM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Wed, Dec 29, 2021 at 11:27:50AM +0100, Roger Pau Monné wrote:
> > > > > On Wed, Dec 29, 2021 at 05:13:00PM +0800, G.R. wrote:
> > > > > > >
> > > > > > > I think this is hitting a KASSERT, could you paste the text printed as
> > > > > > > part of the panic (not just he backtrace)?
> > > > > > >
> > > > > > > Sorry this is taking a bit of time to solve.
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > Sorry that I didn't make it clear in the first place.
> > > > > > It is the same cross boundary assertion.
> > > > >
> > > > > I see. After looking at the code it seems like sglist will coalesce
> > > > > contiguous physical ranges without taking page boundaries into
> > > > > account, which is not suitable for our purpose here. I guess I will
> > > > > either have to modify sglist, or switch to using bus_dma. The main
> > > > > problem with using bus_dma is that it will require bigger changes to
> > > > > netfront I think.
> > > >
> > > > I have a crappy patch to use bus_dma. It's not yet ready for upstream
> > > > but you might want to give it a try to see if it solves the cross page
> > > > boundary issues.
> > > >
> > > I think this version is better.
> >
> > Thanks for all the testing.
> >
> > > It fixed the mbuf cross boundary issue and allowed me to boot from one
> > > disk image successfully.
> >
> > It's good to know it seems to handle splitting mbufs fragments at page
> > boundaries correctly.
> >
> > > But seems like this patch is not stable enough yet and has its own
> > > issue -- memory is not properly released?
> >
> > I know. I've been working on improving it this morning and I'm
> > attaching an updated version below.
> >
> Good news.
> With this  new patch, the NAS domU can serve iSCSI disk without OOM
> panic, at least for a little while.
> I'm going to keep it up and running for a while to see if it's stable over time.

Thanks again for all the testing. Do you see any difference
performance wise?

> BTW, an irrelevant question:
> What's the current status of HVM domU on top of storage driver domain?
> About 7 years ago, one user on the list was able to get this setup up
> and running with your help (patch).[1]
> When I attempted to reproduce a similar setup two years later, I
> discovered that the patch was not submitted.
> And even with that patch the setup cannot be reproduced successfully.
> We spent some time debugging on the problem together[2], but didn't
> bottom out the root cause at that time.
> In case it's still broken and you still have the interest and time, I
> can launch a separate thread on this topic and provide required
> testing environment.

Yes, better as a new thread please.

FWIW, I haven't looked at this since a long time, but I recall some
fixes in order to be able to use driver domains with HVM guests, which
require attaching the disk to dom0 in order for the device model
(QEMU) to access it.

I would give it a try without using stubdomains and see what you get.
You will need to run `xl devd` inside of the driver domain, so you
will need to install xen-tools on the domU. There's an init script to
launch `xl devd` at boot, it's called 'xendriverdomain'.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-01-04 10:25                                     ` Roger Pau Monné
@ 2022-01-04 16:05                                       ` G.R.
  2022-01-05 14:33                                         ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2022-01-04 16:05 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

> > > > But seems like this patch is not stable enough yet and has its own
> > > > issue -- memory is not properly released?
> > >
> > > I know. I've been working on improving it this morning and I'm
> > > attaching an updated version below.
> > >
> > Good news.
> > With this  new patch, the NAS domU can serve iSCSI disk without OOM
> > panic, at least for a little while.
> > I'm going to keep it up and running for a while to see if it's stable over time.
>
> Thanks again for all the testing. Do you see any difference
> performance wise?
I'm still on a *debug* kernel build to capture any potential panic --
none so far -- no performance testing yet.
Since I'm a home user with a relatively lightweight workload, so far I
didn't observe any difference in daily usage.

I did some quick iperf3 testing just now.
1. between nas domU <=> Linux dom0 running on an old i7-3770 based box.
The peak is roughly 12 Gbits/s when domU is the server.
But I do see regression down to ~8.5 Gbits/s when I repeat the test in
a short burst.
The regression can recover when I leave the system idle for a while.

When dom0 is the iperf3 server, the transfer rate is much lower, down
all the way to 1.x Gbits/s.
Sometimes, I can see the following kernel log repeats during the
testing, likely contributing to the slowdown.
             interrupt storm detected on "irq2328:"; throttling interrupt source
Another thing that looks alarming is the retransmission is high:
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   212 MBytes  1.78 Gbits/sec  110    231 KBytes
[  5]   1.00-2.00   sec   230 MBytes  1.92 Gbits/sec    1    439 KBytes
[  5]   2.00-3.00   sec   228 MBytes  1.92 Gbits/sec    3    335 KBytes
[  5]   3.00-4.00   sec   204 MBytes  1.71 Gbits/sec    1    486 KBytes
[  5]   4.00-5.00   sec   201 MBytes  1.69 Gbits/sec  812    258 KBytes
[  5]   5.00-6.00   sec   179 MBytes  1.51 Gbits/sec    1    372 KBytes
[  5]   6.00-7.00   sec  50.5 MBytes   423 Mbits/sec    2    154 KBytes
[  5]   7.00-8.00   sec   194 MBytes  1.63 Gbits/sec  339    172 KBytes
[  5]   8.00-9.00   sec   156 MBytes  1.30 Gbits/sec  854    215 KBytes
[  5]   9.00-10.00  sec   143 MBytes  1.20 Gbits/sec  997   93.8 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.76 GBytes  1.51 Gbits/sec  3120             sender
[  5]   0.00-10.45  sec  1.76 GBytes  1.44 Gbits/sec                  receiver

2. between a remote box <=> nas domU, through a 1Gbps ethernet cable.
Roughly saturate the link when domU is the server, without obvious perf drop
When domU running as a client, the achieved BW is ~30Mbps lower than the peak.
Retransmission sometimes also shows up in this scenario, more
seriously when domU is the client.

I cannot test with the stock kernel nor with your patch in release
mode immediately.
But according to the observed imbalance between inbounding and
outgoing path, non-trivial penalty applies I guess?

> > BTW, an irrelevant question:
> > What's the current status of HVM domU on top of storage driver domain?
> > About 7 years ago, one user on the list was able to get this setup up
> > and running with your help (patch).[1]
> > When I attempted to reproduce a similar setup two years later, I
> > discovered that the patch was not submitted.
> > And even with that patch the setup cannot be reproduced successfully.
> > We spent some time debugging on the problem together[2], but didn't
> > bottom out the root cause at that time.
> > In case it's still broken and you still have the interest and time, I
> > can launch a separate thread on this topic and provide required
> > testing environment.
>
> Yes, better as a new thread please.
>
> FWIW, I haven't looked at this since a long time, but I recall some
> fixes in order to be able to use driver domains with HVM guests, which
> require attaching the disk to dom0 in order for the device model
> (QEMU) to access it.
>
> I would give it a try without using stubdomains and see what you get.
> You will need to run `xl devd` inside of the driver domain, so you
> will need to install xen-tools on the domU. There's an init script to
> launch `xl devd` at boot, it's called 'xendriverdomain'.
Looks like I'm unlucky once again. Let's follow up in a separate thread.

> Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-01-04 16:05                                       ` G.R.
@ 2022-01-05 14:33                                         ` Roger Pau Monné
  2022-01-07 17:14                                           ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2022-01-05 14:33 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Wed, Jan 05, 2022 at 12:05:39AM +0800, G.R. wrote:
> > > > > But seems like this patch is not stable enough yet and has its own
> > > > > issue -- memory is not properly released?
> > > >
> > > > I know. I've been working on improving it this morning and I'm
> > > > attaching an updated version below.
> > > >
> > > Good news.
> > > With this  new patch, the NAS domU can serve iSCSI disk without OOM
> > > panic, at least for a little while.
> > > I'm going to keep it up and running for a while to see if it's stable over time.
> >
> > Thanks again for all the testing. Do you see any difference
> > performance wise?
> I'm still on a *debug* kernel build to capture any potential panic --
> none so far -- no performance testing yet.
> Since I'm a home user with a relatively lightweight workload, so far I
> didn't observe any difference in daily usage.
> 
> I did some quick iperf3 testing just now.

Thanks for doing this.

> 1. between nas domU <=> Linux dom0 running on an old i7-3770 based box.
> The peak is roughly 12 Gbits/s when domU is the server.
> But I do see regression down to ~8.5 Gbits/s when I repeat the test in
> a short burst.
> The regression can recover when I leave the system idle for a while.
> 
> When dom0 is the iperf3 server, the transfer rate is much lower, down
> all the way to 1.x Gbits/s.
> Sometimes, I can see the following kernel log repeats during the
> testing, likely contributing to the slowdown.
>              interrupt storm detected on "irq2328:"; throttling interrupt source

I assume the message is in the domU, not the dom0?

> Another thing that looks alarming is the retransmission is high:
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec   212 MBytes  1.78 Gbits/sec  110    231 KBytes
> [  5]   1.00-2.00   sec   230 MBytes  1.92 Gbits/sec    1    439 KBytes
> [  5]   2.00-3.00   sec   228 MBytes  1.92 Gbits/sec    3    335 KBytes
> [  5]   3.00-4.00   sec   204 MBytes  1.71 Gbits/sec    1    486 KBytes
> [  5]   4.00-5.00   sec   201 MBytes  1.69 Gbits/sec  812    258 KBytes
> [  5]   5.00-6.00   sec   179 MBytes  1.51 Gbits/sec    1    372 KBytes
> [  5]   6.00-7.00   sec  50.5 MBytes   423 Mbits/sec    2    154 KBytes
> [  5]   7.00-8.00   sec   194 MBytes  1.63 Gbits/sec  339    172 KBytes
> [  5]   8.00-9.00   sec   156 MBytes  1.30 Gbits/sec  854    215 KBytes
> [  5]   9.00-10.00  sec   143 MBytes  1.20 Gbits/sec  997   93.8 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  1.76 GBytes  1.51 Gbits/sec  3120             sender
> [  5]   0.00-10.45  sec  1.76 GBytes  1.44 Gbits/sec                  receiver

Do you see the same when running the same tests on a debug kernel
without my patch applied? (ie: a kernel build yourself from the same
baseline but just without my patch applied)

I'm mostly interested in knowing whether the patch itself causes any
regressions from the current state (which might not be great already).

> 
> 2. between a remote box <=> nas domU, through a 1Gbps ethernet cable.
> Roughly saturate the link when domU is the server, without obvious perf drop
> When domU running as a client, the achieved BW is ~30Mbps lower than the peak.
> Retransmission sometimes also shows up in this scenario, more
> seriously when domU is the client.
> 
> I cannot test with the stock kernel nor with your patch in release
> mode immediately.
> 
> But according to the observed imbalance between inbounding and
> outgoing path, non-trivial penalty applies I guess?

We should get a baseline using the same sources without my path
applied.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-01-05 14:33                                         ` Roger Pau Monné
@ 2022-01-07 17:14                                           ` G.R.
  2022-01-10 14:53                                             ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2022-01-07 17:14 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Wed, Jan 5, 2022 at 10:33 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Wed, Jan 05, 2022 at 12:05:39AM +0800, G.R. wrote:
> > > > > > But seems like this patch is not stable enough yet and has its own
> > > > > > issue -- memory is not properly released?
> > > > >
> > > > > I know. I've been working on improving it this morning and I'm
> > > > > attaching an updated version below.
> > > > >
> > > > Good news.
> > > > With this  new patch, the NAS domU can serve iSCSI disk without OOM
> > > > panic, at least for a little while.
> > > > I'm going to keep it up and running for a while to see if it's stable over time.
> > >
> > > Thanks again for all the testing. Do you see any difference
> > > performance wise?
> > I'm still on a *debug* kernel build to capture any potential panic --
> > none so far -- no performance testing yet.
> > Since I'm a home user with a relatively lightweight workload, so far I
> > didn't observe any difference in daily usage.
> >
> > I did some quick iperf3 testing just now.
>
> Thanks for doing this.
>
> > 1. between nas domU <=> Linux dom0 running on an old i7-3770 based box.
> > The peak is roughly 12 Gbits/s when domU is the server.
> > But I do see regression down to ~8.5 Gbits/s when I repeat the test in
> > a short burst.
> > The regression can recover when I leave the system idle for a while.
> >
> > When dom0 is the iperf3 server, the transfer rate is much lower, down
> > all the way to 1.x Gbits/s.
> > Sometimes, I can see the following kernel log repeats during the
> > testing, likely contributing to the slowdown.
> >              interrupt storm detected on "irq2328:"; throttling interrupt source
>
> I assume the message is in the domU, not the dom0?
Yes, in the TrueNAS domU.
BTW, I rebooted back to the stock kernel and the message is no longer observed.

With the stock kernel, the transfer rate from dom0 to nas domU can be
as high as 30Gbps.
The variation is still observed, sometimes down to ~19Gbps. There is
no retransmission in this direction.

For the reverse direction, the observed low transfer rate still exists.
It's still within the range of 1.x Gbps, but should still be better
than the previous test.
The huge number of re-transmission is still observed.
The same behavior can be observed on a stock FreeBSD 12.2 image, so
this is not specific to TrueNAS.

According to the packet capture, the re-transmission appears to be
caused by packet reorder.
Here is one example incident:
1. dom0 sees a sequence jump in the incoming stream and begins to send out SACKs
2. When SACK shows up at domU, it begins to re-transmit lost frames
   (the re-transmit looks weird since it show up as a mixed stream of
1448 bytes and 12 bytes packets, instead of always 1448 bytes)
3. Suddenly the packets that are believed to have lost show up, dom0
accept them as if they are re-transmission
4. The actual re-transmission finally shows up in dom0...
Should we expect packet reorder on a direct virtual link? Sounds fishy to me.
Any chance we can get this re-transmission fixed?

So looks like at least the imbalance between two directions are not
related to your patch.
Likely the debug build is a bigger contributor to the perf difference
in both directions.

I also tried your patch on a release build, and didn't observe any
major difference in iperf3 numbers.
Roughly match the 30Gbps and 1.xGbps number on the stock release kernel.

>
> > Another thing that looks alarming is the retransmission is high:
> > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > [  5]   0.00-1.00   sec   212 MBytes  1.78 Gbits/sec  110    231 KBytes
> > [  5]   1.00-2.00   sec   230 MBytes  1.92 Gbits/sec    1    439 KBytes
> > [  5]   2.00-3.00   sec   228 MBytes  1.92 Gbits/sec    3    335 KBytes
> > [  5]   3.00-4.00   sec   204 MBytes  1.71 Gbits/sec    1    486 KBytes
> > [  5]   4.00-5.00   sec   201 MBytes  1.69 Gbits/sec  812    258 KBytes
> > [  5]   5.00-6.00   sec   179 MBytes  1.51 Gbits/sec    1    372 KBytes
> > [  5]   6.00-7.00   sec  50.5 MBytes   423 Mbits/sec    2    154 KBytes
> > [  5]   7.00-8.00   sec   194 MBytes  1.63 Gbits/sec  339    172 KBytes
> > [  5]   8.00-9.00   sec   156 MBytes  1.30 Gbits/sec  854    215 KBytes
> > [  5]   9.00-10.00  sec   143 MBytes  1.20 Gbits/sec  997   93.8 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  1.76 GBytes  1.51 Gbits/sec  3120             sender
> > [  5]   0.00-10.45  sec  1.76 GBytes  1.44 Gbits/sec                  receiver
>
> Do you see the same when running the same tests on a debug kernel
> without my patch applied? (ie: a kernel build yourself from the same
> baseline but just without my patch applied)
>
> I'm mostly interested in knowing whether the patch itself causes any
> regressions from the current state (which might not be great already).
>
> >
> > 2. between a remote box <=> nas domU, through a 1Gbps ethernet cable.
> > Roughly saturate the link when domU is the server, without obvious perf drop
> > When domU running as a client, the achieved BW is ~30Mbps lower than the peak.
> > Retransmission sometimes also shows up in this scenario, more
> > seriously when domU is the client.
> >
> > I cannot test with the stock kernel nor with your patch in release
> > mode immediately.
> >
> > But according to the observed imbalance between inbounding and
> > outgoing path, non-trivial penalty applies I guess?
>
> We should get a baseline using the same sources without my path
> applied.
>
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-01-07 17:14                                           ` G.R.
@ 2022-01-10 14:53                                             ` Roger Pau Monné
  2022-01-11 14:24                                               ` G.R.
  2022-10-30 16:36                                               ` G.R.
  0 siblings, 2 replies; 33+ messages in thread
From: Roger Pau Monné @ 2022-01-10 14:53 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Sat, Jan 08, 2022 at 01:14:26AM +0800, G.R. wrote:
> On Wed, Jan 5, 2022 at 10:33 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Wed, Jan 05, 2022 at 12:05:39AM +0800, G.R. wrote:
> > > > > > > But seems like this patch is not stable enough yet and has its own
> > > > > > > issue -- memory is not properly released?
> > > > > >
> > > > > > I know. I've been working on improving it this morning and I'm
> > > > > > attaching an updated version below.
> > > > > >
> > > > > Good news.
> > > > > With this  new patch, the NAS domU can serve iSCSI disk without OOM
> > > > > panic, at least for a little while.
> > > > > I'm going to keep it up and running for a while to see if it's stable over time.
> > > >
> > > > Thanks again for all the testing. Do you see any difference
> > > > performance wise?
> > > I'm still on a *debug* kernel build to capture any potential panic --
> > > none so far -- no performance testing yet.
> > > Since I'm a home user with a relatively lightweight workload, so far I
> > > didn't observe any difference in daily usage.
> > >
> > > I did some quick iperf3 testing just now.
> >
> > Thanks for doing this.
> >
> > > 1. between nas domU <=> Linux dom0 running on an old i7-3770 based box.
> > > The peak is roughly 12 Gbits/s when domU is the server.
> > > But I do see regression down to ~8.5 Gbits/s when I repeat the test in
> > > a short burst.
> > > The regression can recover when I leave the system idle for a while.
> > >
> > > When dom0 is the iperf3 server, the transfer rate is much lower, down
> > > all the way to 1.x Gbits/s.
> > > Sometimes, I can see the following kernel log repeats during the
> > > testing, likely contributing to the slowdown.
> > >              interrupt storm detected on "irq2328:"; throttling interrupt source
> >
> > I assume the message is in the domU, not the dom0?
> Yes, in the TrueNAS domU.
> BTW, I rebooted back to the stock kernel and the message is no longer observed.
> 
> With the stock kernel, the transfer rate from dom0 to nas domU can be
> as high as 30Gbps.
> The variation is still observed, sometimes down to ~19Gbps. There is
> no retransmission in this direction.
> 
> For the reverse direction, the observed low transfer rate still exists.
> It's still within the range of 1.x Gbps, but should still be better
> than the previous test.
> The huge number of re-transmission is still observed.
> The same behavior can be observed on a stock FreeBSD 12.2 image, so
> this is not specific to TrueNAS.

So that's domU sending the data, and dom0 receiving it.

> 
> According to the packet capture, the re-transmission appears to be
> caused by packet reorder.
> Here is one example incident:
> 1. dom0 sees a sequence jump in the incoming stream and begins to send out SACKs
> 2. When SACK shows up at domU, it begins to re-transmit lost frames
>    (the re-transmit looks weird since it show up as a mixed stream of
> 1448 bytes and 12 bytes packets, instead of always 1448 bytes)
> 3. Suddenly the packets that are believed to have lost show up, dom0
> accept them as if they are re-transmission

Hm, so there seems to be some kind of issue with ordering I would say.

> 4. The actual re-transmission finally shows up in dom0...
> Should we expect packet reorder on a direct virtual link? Sounds fishy to me.
> Any chance we can get this re-transmission fixed?

Does this still happen with all the extra features disabled? (-rxcsum
-txcsum -lro -tso)

> So looks like at least the imbalance between two directions are not
> related to your patch.
> Likely the debug build is a bigger contributor to the perf difference
> in both directions.
> 
> I also tried your patch on a release build, and didn't observe any
> major difference in iperf3 numbers.
> Roughly match the 30Gbps and 1.xGbps number on the stock release kernel.

Thanks a lot, will try to get this upstream then.

Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-01-10 14:53                                             ` Roger Pau Monné
@ 2022-01-11 14:24                                               ` G.R.
  2022-10-30 16:36                                               ` G.R.
  1 sibling, 0 replies; 33+ messages in thread
From: G.R. @ 2022-01-11 14:24 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Mon, Jan 10, 2022 at 10:54 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Sat, Jan 08, 2022 at 01:14:26AM +0800, G.R. wrote:
> > On Wed, Jan 5, 2022 at 10:33 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Wed, Jan 05, 2022 at 12:05:39AM +0800, G.R. wrote:
> > > > > > > > But seems like this patch is not stable enough yet and has its own
> > > > > > > > issue -- memory is not properly released?
> > > > > > >
> > > > > > > I know. I've been working on improving it this morning and I'm
> > > > > > > attaching an updated version below.
> > > > > > >
> > > > > > Good news.
> > > > > > With this  new patch, the NAS domU can serve iSCSI disk without OOM
> > > > > > panic, at least for a little while.
> > > > > > I'm going to keep it up and running for a while to see if it's stable over time.
> > > > >
> > > > > Thanks again for all the testing. Do you see any difference
> > > > > performance wise?
> > > > I'm still on a *debug* kernel build to capture any potential panic --
> > > > none so far -- no performance testing yet.
> > > > Since I'm a home user with a relatively lightweight workload, so far I
> > > > didn't observe any difference in daily usage.
> > > >
> > > > I did some quick iperf3 testing just now.
> > >
> > > Thanks for doing this.
> > >
> > > > 1. between nas domU <=> Linux dom0 running on an old i7-3770 based box.
> > > > The peak is roughly 12 Gbits/s when domU is the server.
> > > > But I do see regression down to ~8.5 Gbits/s when I repeat the test in
> > > > a short burst.
> > > > The regression can recover when I leave the system idle for a while.
> > > >
> > > > When dom0 is the iperf3 server, the transfer rate is much lower, down
> > > > all the way to 1.x Gbits/s.
> > > > Sometimes, I can see the following kernel log repeats during the
> > > > testing, likely contributing to the slowdown.
> > > >              interrupt storm detected on "irq2328:"; throttling interrupt source
> > >
> > > I assume the message is in the domU, not the dom0?
> > Yes, in the TrueNAS domU.
> > BTW, I rebooted back to the stock kernel and the message is no longer observed.
> >
> > With the stock kernel, the transfer rate from dom0 to nas domU can be
> > as high as 30Gbps.
> > The variation is still observed, sometimes down to ~19Gbps. There is
> > no retransmission in this direction.
> >
> > For the reverse direction, the observed low transfer rate still exists.
> > It's still within the range of 1.x Gbps, but should still be better
> > than the previous test.
> > The huge number of re-transmission is still observed.
> > The same behavior can be observed on a stock FreeBSD 12.2 image, so
> > this is not specific to TrueNAS.
>
> So that's domU sending the data, and dom0 receiving it.
Correct.
>
> >
> > According to the packet capture, the re-transmission appears to be
> > caused by packet reorder.
> > Here is one example incident:
> > 1. dom0 sees a sequence jump in the incoming stream and begins to send out SACKs
> > 2. When SACK shows up at domU, it begins to re-transmit lost frames
> >    (the re-transmit looks weird since it show up as a mixed stream of
> > 1448 bytes and 12 bytes packets, instead of always 1448 bytes)
> > 3. Suddenly the packets that are believed to have lost show up, dom0
> > accept them as if they are re-transmission
>
> Hm, so there seems to be some kind of issue with ordering I would say.
Agree.

>
> > 4. The actual re-transmission finally shows up in dom0...
> > Should we expect packet reorder on a direct virtual link? Sounds fishy to me.
> > Any chance we can get this re-transmission fixed?
>
> Does this still happen with all the extra features disabled? (-rxcsum
> -txcsum -lro -tso)
No obvious impact I would say.
After disabling all extra features:
xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 00:18:3c:51:6e:4c
    inet 192.168.1.9 netmask 0xffffff00 broadcast 192.168.1.255
    media: Ethernet manual
    status: active
    nd6 options=9<PERFORMNUD,IFDISABLED>
The iperf3 result:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.04 GBytes  1.75 Gbits/sec  12674             sender
[  5]   0.00-10.14  sec  2.04 GBytes  1.73 Gbits/sec                  receiver
BTW, those extra features have huge impact on the dom0 => domU direction.
It goes all the way down from ~30 / 18 Gbps to 3.5 / 1.8 Gbps
(variation range) without those.
But there is no retransmission at all in both configs for this direction.
I wonder why such a huge difference since the nic is purely virtual
without any HW acceleration?

Any further suggestions on this retransmission issue?

>
> > So looks like at least the imbalance between two directions are not
> > related to your patch.
> > Likely the debug build is a bigger contributor to the perf difference
> > in both directions.
> >
> > I also tried your patch on a release build, and didn't observe any
> > major difference in iperf3 numbers.
> > Roughly match the 30Gbps and 1.xGbps number on the stock release kernel.
>
> Thanks a lot, will try to get this upstream then.
>
> Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-01-10 14:53                                             ` Roger Pau Monné
  2022-01-11 14:24                                               ` G.R.
@ 2022-10-30 16:36                                               ` G.R.
  2022-11-03  6:58                                                 ` Paul Leiber
  1 sibling, 1 reply; 33+ messages in thread
From: G.R. @ 2022-10-30 16:36 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Mon, Jan 10, 2022 at 10:54 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > So looks like at least the imbalance between two directions are not
> > related to your patch.
> > Likely the debug build is a bigger contributor to the perf difference
> > in both directions.
> >
> > I also tried your patch on a release build, and didn't observe any
> > major difference in iperf3 numbers.
> > Roughly match the 30Gbps and 1.xGbps number on the stock release kernel.
>
> Thanks a lot, will try to get this upstream then.
>
> Roger.

Hi Roger, any news for the upstream fix? I haven't heard any news since...
The reason I came back to this thread is that I totally forgot about
this issue and upgraded to FreeNAS 13 only to rediscover this issue
once again :-(

Any chance the patch can apply on FreeBSD 13.1-RELEASE-p1 kernel?

Thanks,
G.R.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-10-30 16:36                                               ` G.R.
@ 2022-11-03  6:58                                                 ` Paul Leiber
  2022-11-03 12:22                                                   ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Paul Leiber @ 2022-11-03  6:58 UTC (permalink / raw)
  To: xen-devel



Am 30.10.2022 um 17:36 schrieb G.R.:
> On Mon, Jan 10, 2022 at 10:54 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> So looks like at least the imbalance between two directions are not
>>> related to your patch.
>>> Likely the debug build is a bigger contributor to the perf difference
>>> in both directions.
>>>
>>> I also tried your patch on a release build, and didn't observe any
>>> major difference in iperf3 numbers.
>>> Roughly match the 30Gbps and 1.xGbps number on the stock release kernel.
>> Thanks a lot, will try to get this upstream then.
>>
>> Roger.
> Hi Roger, any news for the upstream fix? I haven't heard any news since...
> The reason I came back to this thread is that I totally forgot about
> this issue and upgraded to FreeNAS 13 only to rediscover this issue
> once again :-(
>
> Any chance the patch can apply on FreeBSD 13.1-RELEASE-p1 kernel?
>
> Thanks,
> G.R.
>

Hi,

I want to confirm that the patch in an official release would make quite some people very happy. E.g. among OPNsense users, there are some who
suffer from the network issue [1]. FWIW, I compiled a kernel including Roger's patch, and it seems to be working without trouble in my OPNsense DomU.

Best regards,

Paul

[1] https://forum.opnsense.org/index.php?topic=28708.15



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-11-03  6:58                                                 ` Paul Leiber
@ 2022-11-03 12:22                                                   ` Roger Pau Monné
  2022-12-14  6:16                                                     ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2022-11-03 12:22 UTC (permalink / raw)
  To: Paul Leiber, G.R.; +Cc: xen-devel

On Thu, Nov 03, 2022 at 07:58:52AM +0100, Paul Leiber wrote:
> 
> 
> Am 30.10.2022 um 17:36 schrieb G.R.:
> > On Mon, Jan 10, 2022 at 10:54 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > So looks like at least the imbalance between two directions are not
> > > > related to your patch.
> > > > Likely the debug build is a bigger contributor to the perf difference
> > > > in both directions.
> > > > 
> > > > I also tried your patch on a release build, and didn't observe any
> > > > major difference in iperf3 numbers.
> > > > Roughly match the 30Gbps and 1.xGbps number on the stock release kernel.
> > > Thanks a lot, will try to get this upstream then.
> > > 
> > > Roger.
> > Hi Roger, any news for the upstream fix? I haven't heard any news since...
> > The reason I came back to this thread is that I totally forgot about
> > this issue and upgraded to FreeNAS 13 only to rediscover this issue
> > once again :-(
> > 
> > Any chance the patch can apply on FreeBSD 13.1-RELEASE-p1 kernel?
> > 
> > Thanks,
> > G.R.
> > 
> 
> Hi,
> 
> I want to confirm that the patch in an official release would make quite some people very happy. E.g. among OPNsense users, there are some who
> suffer from the network issue [1]. FWIW, I compiled a kernel including Roger's patch, and it seems to be working without trouble in my OPNsense DomU.

Hello to both,

Sorry, I completely dropped the ball on that patch, didn't even
remember I had it pending :(.

Will do a build test with it and commit later today, I don't think I
will get any feedback, and it seems to improve the situation for your
use-cases.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-11-03 12:22                                                   ` Roger Pau Monné
@ 2022-12-14  6:16                                                     ` G.R.
  2024-01-09 11:13                                                       ` Niklas Hallqvist
  0 siblings, 1 reply; 33+ messages in thread
From: G.R. @ 2022-12-14  6:16 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Paul Leiber, xen-devel

On Thu, Nov 3, 2022 at 8:37 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > > Roger.
> > > Hi Roger, any news for the upstream fix? I haven't heard any news since...
> > > The reason I came back to this thread is that I totally forgot about
> > > this issue and upgraded to FreeNAS 13 only to rediscover this issue
> > > once again :-(
> > >
> > > Any chance the patch can apply on FreeBSD 13.1-RELEASE-p1 kernel?
> > >
> > > Thanks,
> > > G.R.
> > >
> >
> > Hi,
> >
> > I want to confirm that the patch in an official release would make quite some people very happy. E.g. among OPNsense users, there are some who
> > suffer from the network issue [1]. FWIW, I compiled a kernel including Roger's patch, and it seems to be working without trouble in my OPNsense DomU.
>
> Hello to both,
>
> Sorry, I completely dropped the ball on that patch, didn't even
> remember I had it pending :(.
>
> Will do a build test with it and commit later today, I don't think I
> will get any feedback, and it seems to improve the situation for your
> use-cases.

Hi Roger,
Just another query of the latest status. It'll be great if you can
share a link to the upstream commit.
I'm thinking of asking for a back-port of your fix to the FreeNAS
community, assuming it will take a long time to roll out otherwise.

Thanks,
G.R.

>
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2022-12-14  6:16                                                     ` G.R.
@ 2024-01-09 11:13                                                       ` Niklas Hallqvist
  2024-01-09 13:53                                                         ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Niklas Hallqvist @ 2024-01-09 11:13 UTC (permalink / raw)
  To: G.R.; +Cc: Roger Pau Monné, Paul Leiber, xen-devel

> On 14 Dec 2022, at 07:16, G.R. <firemeteor@users.sourceforge.net> wrote:
> 
> On Thu, Nov 3, 2022 at 8:37 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>>> Roger.
>>>> Hi Roger, any news for the upstream fix? I haven't heard any news since...
>>>> The reason I came back to this thread is that I totally forgot about
>>>> this issue and upgraded to FreeNAS 13 only to rediscover this issue
>>>> once again :-(
>>>> 
>>>> Any chance the patch can apply on FreeBSD 13.1-RELEASE-p1 kernel?
>>>> 
>>>> Thanks,
>>>> G.R.
>>>> 
>>> 
>>> Hi,
>>> 
>>> I want to confirm that the patch in an official release would make quite some people very happy. E.g. among OPNsense users, there are some who
>>> suffer from the network issue [1]. FWIW, I compiled a kernel including Roger's patch, and it seems to be working without trouble in my OPNsense DomU.
>> 
>> Hello to both,
>> 
>> Sorry, I completely dropped the ball on that patch, didn't even
>> remember I had it pending :(.
>> 
>> Will do a build test with it and commit later today, I don't think I
>> will get any feedback, and it seems to improve the situation for your
>> use-cases.
> 
> Hi Roger,
> Just another query of the latest status. It'll be great if you can
> share a link to the upstream commit.
> I'm thinking of asking for a back-port of your fix to the FreeNAS
> community, assuming it will take a long time to roll out otherwise.
> 
> Thanks,
> G.R.
> 
>> 
>> Thanks, Roger.
> 
> 

Hi everyone!

So did anything ever happen on this?  I find myself in the same situation with TrueNAS core 13, and can’t see any signs of changes in the FreeBSD 13 branches.

G.R: Did you ever build a kernel for TrueNAS 13?  Care to share it?  I don’t have a build environment setup, so I thought I’d just try to ask :-)

/Niklas



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2024-01-09 11:13                                                       ` Niklas Hallqvist
@ 2024-01-09 13:53                                                         ` Roger Pau Monné
  2024-01-19 15:51                                                           ` G.R.
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2024-01-09 13:53 UTC (permalink / raw)
  To: Niklas Hallqvist; +Cc: G.R., Paul Leiber, xen-devel

On Tue, Jan 09, 2024 at 12:13:04PM +0100, Niklas Hallqvist wrote:
> > On 14 Dec 2022, at 07:16, G.R. <firemeteor@users.sourceforge.net> wrote:
> > 
> > On Thu, Nov 3, 2022 at 8:37 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>>> Roger.
> >>>> Hi Roger, any news for the upstream fix? I haven't heard any news since...
> >>>> The reason I came back to this thread is that I totally forgot about
> >>>> this issue and upgraded to FreeNAS 13 only to rediscover this issue
> >>>> once again :-(
> >>>> 
> >>>> Any chance the patch can apply on FreeBSD 13.1-RELEASE-p1 kernel?
> >>>> 
> >>>> Thanks,
> >>>> G.R.
> >>>> 
> >>> 
> >>> Hi,
> >>> 
> >>> I want to confirm that the patch in an official release would make quite some people very happy. E.g. among OPNsense users, there are some who
> >>> suffer from the network issue [1]. FWIW, I compiled a kernel including Roger's patch, and it seems to be working without trouble in my OPNsense DomU.
> >> 
> >> Hello to both,
> >> 
> >> Sorry, I completely dropped the ball on that patch, didn't even
> >> remember I had it pending :(.
> >> 
> >> Will do a build test with it and commit later today, I don't think I
> >> will get any feedback, and it seems to improve the situation for your
> >> use-cases.
> > 
> > Hi Roger,
> > Just another query of the latest status. It'll be great if you can
> > share a link to the upstream commit.
> > I'm thinking of asking for a back-port of your fix to the FreeNAS
> > community, assuming it will take a long time to roll out otherwise.
> > 
> > Thanks,
> > G.R.
> > 
> >> 
> >> Thanks, Roger.
> > 
> > 
> 
> Hi everyone!
> 
> So did anything ever happen on this?  I find myself in the same situation with TrueNAS core 13, and can’t see any signs of changes in the FreeBSD 13 branches.

Hello,

I don't think the change is suitable to backport, it's IMO too
intrusive and risky.  It was committed late 2022, and it's in 14.0:

commit dabb3db7a817f003af3f89c965ba369c67fc4910
Author: Roger Pau Monné <royger@FreeBSD.org>
Date:   Thu Nov 3 13:29:22 2022 +0100

    xen/netfront: deal with mbuf data crossing a page boundary

    There's been a report recently of mbufs with data that crosses a page
    boundary. It seems those mbufs are generated by the iSCSI target
    system:

    https://lists.xenproject.org/archives/html/xen-devel/2021-12/msg01581.html

    In order to handle those mbufs correctly on netfront use the bus_dma
    interface and explicitly request that segments must not cross a page
    boundary. No other requirements are necessary, so it's expected that
    bus_dma won't need to bounce the data and hence it shouldn't
    introduce a too big performance penalty.

    Using bus_dma requires some changes to netfront, mainly in order to
    accommodate for the fact that now ring slots no longer have a 1:1
    match with mbufs, as a single mbuf can use two ring slots if the data
    buffer crosses a page boundary. Store the first packet of the mbuf
    chain in every ring slot that's used, and use a mbuf tag in order to
    store the bus_dma related structures and a refcount to keep track of
    the pending slots before the mbuf chain can be freed.

    Reported by: G.R.
    Tested by: G.R.
    MFC: 1 week
    Differential revision: https://reviews.freebsd.org/D33876

TrueNAS/OOPNsense might consider picking it up themselves.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2024-01-09 13:53                                                         ` Roger Pau Monné
@ 2024-01-19 15:51                                                           ` G.R.
  0 siblings, 0 replies; 33+ messages in thread
From: G.R. @ 2024-01-19 15:51 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Niklas Hallqvist, Paul Leiber, xen-devel

On Tue, Jan 9, 2024 at 11:28 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Tue, Jan 09, 2024 at 12:13:04PM +0100, Niklas Hallqvist wrote:
> > > On 14 Dec 2022, at 07:16, G.R. <firemeteor@users.sourceforge.net> wrote:
...
> > > Hi Roger,
> > > Just another query of the latest status. It'll be great if you can
> > > share a link to the upstream commit.
> > > I'm thinking of asking for a back-port of your fix to the FreeNAS
> > > community, assuming it will take a long time to roll out otherwise.
> > >
> > > Thanks,
> > > G.R.
> > >
> > >>
> > >> Thanks, Roger.
> > >
> > >
> >
> > Hi everyone!
> >
> > So did anything ever happen on this?  I find myself in the same situation with TrueNAS core 13, and can’t see any signs of changes in the FreeBSD 13 branches.
>
> Hello,
>
> I don't think the change is suitable to backport, it's IMO too
> intrusive and risky.  It was committed late 2022, and it's in 14.0:
>
...
> TrueNAS/OOPNsense might consider picking it up themselves.
>
> Thanks, Roger.

Just FYI: I was able to locate that commit in 14.0 tree and
cherry-picked it into TrueNas 13.
I did it last November and the build has been running stably without
issue since.
The fix was fairly standalone and didn't cause any trouble during the
cherry-picking so it could be reasonable to raise a request in the
TrueNAS forum.

Thanks,
G.R.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Possible bug? DOM-U network stopped working after fatal error reported in DOM0
  2021-12-18 18:35 Possible bug? DOM-U network stopped working after fatal error reported in DOM0 G.R.
  2021-12-19  6:10 ` Juergen Gross
  2021-12-19 17:31 ` G.R.
@ 2021-12-20 13:51 ` Roger Pau Monné
  2 siblings, 0 replies; 33+ messages in thread
From: Roger Pau Monné @ 2021-12-20 13:51 UTC (permalink / raw)
  To: G.R.; +Cc: xen-devel

On Sun, Dec 19, 2021 at 02:35:56AM +0800, G.R. wrote:
> Hi all,
> 
> I ran into the following error report in the DOM0 kernel after a recent upgrade:
> [  501.840816] vif vif-1-0 vif1.0: Cross page boundary, txp->offset:
> 2872, size: 1460
> [  501.840828] vif vif-1-0 vif1.0: fatal error; disabling device
> [  501.841076] xenbr0: port 2(vif1.0) entered disabled state
> Once this error happens, the DOM-U behind this vif is no-longer
> accessible. And recreating the same DOM-U does not fix the problem.
> Only a reboot on the physical host machine helps.
> 
> The problem showed up after a recent upgrade on the DOM-U OS from
> FreeNAS 11.3 to TrueNAS 12.0U7 and breaks the iSCSI service while
> leaving other services like NFS intact.
> The underlying OS for the NAS is FreeBSD, version 11.3 and 12.2 respectively.
> So far I have tried the following combos:
> - Linux 4.19 DOM0 + XEN 4.8 + FreeBSD 11.3 DOM-U: Good
> - Linux 4.19 DOM0 + XEN 4.8 + FreeBSD 12.2 DOM-U: Regressed
> - Linux 5.10 DOM0 + XEN 4.8 + FreeBSD 12.2 DOM-U: Regressed
> - Linux 5.10 DOM0 + XEN 4.11 + FreeBSD 12.2 DOM-U: Regressed
> 
> I plan to try out the XEN 4.14 version which is the latest I can get
> from the distro (Debian).
> If that still does not fix the problem, I would build the 4.16 version
> from source as my last resort.
> 
> I have to admit that this trial process is blind as I have no idea
> which component in the combo is to be blamed. Is it a bug in the
> backend-driver, frontend-driver or the hypervisor itself? Or due to
> incompatible versions? Any suggestion on other diagnose ideas (e.g.
> debug logs) will be welcome, while I work on the planned experiments.

This is a bug in FreeBSD netfront, so no matter which Linux or Xen
version you use.

Does it make a difference if you disable TSO and LRO from netfront?

$ ifconfig xn0 -tso -lro

Do you have instructions I can follow in order to try to reproduce the
issue?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-01-19 15:52 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-18 18:35 Possible bug? DOM-U network stopped working after fatal error reported in DOM0 G.R.
2021-12-19  6:10 ` Juergen Gross
2021-12-19 17:31 ` G.R.
2021-12-20 17:13   ` G.R.
2021-12-21 13:50     ` Roger Pau Monné
2021-12-21 18:19       ` G.R.
2021-12-21 19:12         ` Roger Pau Monné
2021-12-23 15:49           ` G.R.
2021-12-24 11:24             ` Roger Pau Monné
2021-12-25 16:39               ` G.R.
2021-12-25 18:06                 ` G.R.
2021-12-27 19:04                   ` Roger Pau Monné
     [not found]                     ` <CAKhsbWY5=vENgwgq3NV44KSZQgpOPY=33CMSZo=jweAcRDjBwg@mail.gmail.com>
2021-12-29  8:32                       ` Roger Pau Monné
2021-12-29  9:13                         ` G.R.
2021-12-29 10:27                           ` Roger Pau Monné
2021-12-29 19:07                             ` Roger Pau Monné
2021-12-30 15:12                               ` G.R.
2021-12-30 18:51                                 ` Roger Pau Monné
2021-12-31 14:47                                   ` G.R.
2022-01-04 10:25                                     ` Roger Pau Monné
2022-01-04 16:05                                       ` G.R.
2022-01-05 14:33                                         ` Roger Pau Monné
2022-01-07 17:14                                           ` G.R.
2022-01-10 14:53                                             ` Roger Pau Monné
2022-01-11 14:24                                               ` G.R.
2022-10-30 16:36                                               ` G.R.
2022-11-03  6:58                                                 ` Paul Leiber
2022-11-03 12:22                                                   ` Roger Pau Monné
2022-12-14  6:16                                                     ` G.R.
2024-01-09 11:13                                                       ` Niklas Hallqvist
2024-01-09 13:53                                                         ` Roger Pau Monné
2024-01-19 15:51                                                           ` G.R.
2021-12-20 13:51 ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.