Re: [Wireshark-dev] Multiple-line parsing of packets dissected over HTTP

From: Jonathan Nieder <jrnieder@gmail.com>
To: Developer support list for Wireshark <wireshark-dev@wireshark.org>
Cc: Joey Salazar <jgsal@protonmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [Wireshark-dev] Multiple-line parsing of packets dissected over HTTP
Date: Tue, 19 Jan 2021 16:09:55 -0800	[thread overview]
Message-ID: <YAd0044tbht/DOKq@google.com> (raw)
In-Reply-To: <CAGka-81Ze71igsW2zsfzYqqkr+sz5FE6YYo3mEpdEQUSUxZvaw@mail.gmail.com>

Hi,

Pascal Quantin wrote:
> Le mar. 19 janv. 2021 à 17:45, Joey Salazar via Wireshark-dev <
> wireshark-dev@wireshark.org> a écrit :

>> In commit 33af2649 [1] we can keep dissecting the contents of the req,
>> adv, and res packets by setting
>>  while (plen > 0) { }
>> either in `dissect_git_pdu()` or in `dissect_one_pkt_line()`, but for now
>> in `dissect_git_pdu()` it'd be a bit messy, so wanted to ask for your
>> feedback for getting `dissect_one_pkt_line()` to work properly first.
>>
>> As you can see in pcap 169 [2], it correctly parses the length of the
>> first line as 0x0014 (20 bytes) until `0x0a`, then it's supposed to get the
>> length of the next line by the first 4 hex bytes in that line, but instead
>> of reading the length as 0x0018 (24 bytes) it's reading it as 0x0010 (16
>> bytes), and anyways, this particular line's length actually is 59 bytes.

Interesting.  Let me summarize your question: getting the information
in one place here, the relevant code at line 114 looks like

| +  while (plen > 0) {
| +    proto_tree_add_uint(git_tree, hf_git_packet_len, tvb, offset, 4, plen);
| +    offset += 4;
| +    plen -= 4;
| +
| +    proto_tree_add_item(git_tree, hf_git_packet_data, tvb, offset, plen, ENC_NA);
| +    offset += plen;
| +    // To-do: add lines for parsing of terminator packet 0000
| +  }

The relevant part of the pcap is shown in an image; transcribing
imperfectly, I see

| 0014command=ls-refs\n
| 0018agent=git/2.29.0.rc2
| 0016object-format=sha1
| 0001
[etc]

where \n denotes a newline byte and there are no newlines between
these pkt-lines.

That first pkt-line has 4 bytes for the length, followed by 12 bytes
for "command=ls-refs\n", including newline.  So why does this parse as

	packet-length: 0x0014
	packet data: "command=ls-refs\n"
	packet-length: 0x0010
	packet data: "agent=[etc]"

?

[...]
> So what is the code leading to this dissection? It does not seem to be
> https://gitlab.com/joeysal/wireshark/-/commit/33af2649927cb5660d4aeb64b9a9e9a58a1823aa
> as dissect_one_pkt_line() seem to read only one line (BTW using a while
> loop in this commit is useless as you are incrementing offset by plen, and
> the code you shared considers that plen includes the 4 bytes of the packet
> length field while your screenshot does not assume that).

This reply is a bit dense, but it contains the hints to move forward:

First, there's the matter of the contract of the dissect_one_pkt_line()
function.  The name suggests it would dissect a *single* pkt-line, but
it has this loop in it.  What does it actually do?  And what do we
want it to do?

That second question is easy to answer: this code will be much easier
to read if dissect_one_pktline dissects a single pkt-line!  For
example, if we imagine a contract like

	/** Dissects a single pkt-line.
	 *
	 * @param[in] tvb Buffer containing a pkt-line.
	 * @param offset Offset at which to start reading.
	 * @param[out] tree Tree to attach the dissected pkt-line to.
	 * @return Number of bytes dissected, or -1 on error.
	 */
	static int dissect_one_pkt_line(tvbuff_t *tvb, int offset, proto_tree *tree)

then we could call this in a loop, like:

	int offset = 0;

	while (offset < total length)
		offset += dissect_one_pkt_line(tvb, offset, tree);

Obtaining the total length and including some error handling left as
an exercise to the reader.

As for the first question: what does the current code do?  The loop at
l114 doesn't modify plen except by subtracting 4 from it.  So instead
of reading the pkt-length from the next pkt-line, it assumes it is 4
bytes less.  0x14 - 4 is 0x10, hence the 0x10 as pkt length
assumption.

Thanks and hope that helps,
Jonathan