Re: [PATCH] staging: wfx: add gcc extension __force cast

From: Jerome Pouiller <Jerome.Pouiller@silabs.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: "devel@driverdev.osuosl.org" <devel@driverdev.osuosl.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jules Irenge <jbi.octave@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Boqun.Feng@microsoft.com" <Boqun.Feng@microsoft.com>
Subject: Re: [PATCH] staging: wfx: add gcc extension __force cast
Date: Tue, 12 Nov 2019 11:32:07 +0000	[thread overview]
Message-ID: <2852964.2jKPfdd9jE@pc-42> (raw)
In-Reply-To: <20191111202852.GX26530@ZenIV.linux.org.uk>

Hello Al,

Thank you for your extensive review.

On Monday 11 November 2019 21:28:52 CET Al Viro wrote:
> On Mon, Nov 11, 2019 at 01:51:33PM +0000, Jules Irenge wrote:
> >
> > > NAK.  force-cast (and it's not a gcc extension, BTW - it's sparse) is basically
> > > "I know better; the code is right, so STFU already".  *IF* counters.count_...
> > > is really little-endian 32bit, then why isn't it declared that way?  And if
> > > it's host-endian, you've just papered over a real bug here.
> > >
> > > As a general rule "fix" doesn't mean "tell it to shut up"...
> > >
> >
> > Thanks for the comments, I have updated  but I have a mixed mind on the
> > __le32. I have to read more about it.
> >
> > I would appreciate if you can comment again on the update.
> 
> From the look at the driver, it seems that all these counters are a part of
> structure that is read from the hardware and only used as little-endian.
> So just declare all fields in struct hif_mib_extended_count_table as
> __le32; easy enough.  Looking a bit further, the same goes for
> struct hif_mib_bcn_filter_table ->num_of_info_elmts
> struct hif_mib_keep_alive_period ->keep_alive_period (__le16)
> struct hif_mib_template_frame ->frame_length (__le16)
> struct hif_mib_set_association_mode ->basic_rate_set (__le32)
> struct hif_req_update_ie ->num_i_es (__le16)
> struct hif_req_write_mib ->mib_id, ->length (__le16 both)
> struct hif_req_read_mib ->mib_id (__le16)
> struct hif_req_configuration ->length (__le16)

Indeed, structs declared in hif_api* are shared with the hardware and
should use __le16/__le32. However, as you noticed below, these structs
are sometime used in other parts of the code that are not related to
the hardware. 

I have in my local queue a set of patches that improve the situation.
Objective is to limit usage of hif structs to hif_tx.c, hif_tx_mib.c
and hif_rx.c (which are correct places to handle hardware
communication). I hope to be able to submit these patches in 2 weeks.

[...]
> and that's where the real bugs start to show up; leaving the misbegotten
> forest of macros in misbegotten tracing shite aside, we have this:
> 
> static const struct ieee80211_supported_band wfx_band_2ghz = {
>         .channels = wfx_2ghz_chantable,
>         .n_channels = ARRAY_SIZE(wfx_2ghz_chantable),
>         .bitrates = wfx_rates,
>         .n_bitrates = ARRAY_SIZE(wfx_rates),
>         .ht_cap = {
>                 // Receive caps
>                 .cap = IEEE80211_HT_CAP_GRN_FLD | IEEE80211_HT_CAP_SGI_20 |
>                        IEEE80211_HT_CAP_MAX_AMSDU | (1 << IEEE80211_HT_CAP_RX_STBC_SHIFT),
>                 .ht_supported = 1,
>                 .ampdu_factor = IEEE80211_HT_MAX_AMPDU_16K,
>                 .ampdu_density = IEEE80211_HT_MPDU_DENSITY_NONE,
>                 .mcs = {
>                         .rx_mask = { 0xFF }, // MCS0 to MCS7
>                         .rx_highest = 65,
> drivers/staging/wfx/main.c:108:39: refering to this initializer.
> Sparse say that it expects rx_highest to be __le16.  And that's
> not a driver-specific structure; it's generic ieee80211 one.  Which
> says
> struct ieee80211_mcs_info {
>         u8 rx_mask[IEEE80211_HT_MCS_MASK_LEN];
>         __le16 rx_highest;
>         u8 tx_params;
>         u8 reserved[3];
> } __packed;
> and grepping for rx_highest through the tree shows that everything else
> is treating it as little-endian 16bit.
> 
> Almost certainly a bug on big-endian hosts; should be .rx_highest = cpu_to_le16(65),
> instead.

Agree.

> Looking for more low-hanging fruits, we have
> static int indirect_read32_locked(struct wfx_dev *wdev, int reg, u32 addr, u32 *val)
> {
>         int ret;
>         __le32 *tmp = kmalloc(sizeof(u32), GFP_KERNEL);
> 
>         if (!tmp)
>                 return -ENOMEM;
>         wdev->hwbus_ops->lock(wdev->hwbus_priv);
>         ret = indirect_read(wdev, reg, addr, tmp, sizeof(u32));
>         *val = cpu_to_le32(*tmp);
>         _trace_io_ind_read32(reg, addr, *val);
>         wdev->hwbus_ops->unlock(wdev->hwbus_priv);
>         kfree(tmp);
>         return ret;
> }
> with warnings about val = cpu_to_le32(*tmp); fair enough, since *val is
> host-endian (u32) and *tmp - little-endian.  Trivial misannotation -
> it should've been le32_to_cpu(), not cpu_to_le32().  Same mapping on
> all CPUs we are ever likely to support, so it's just a misannotation,
> not a bug per se.

Agree.

> drivers/staging/wfx/hif_tx_mib.h:34:38: warning: incorrect type in initializer (different base types)
> drivers/staging/wfx/hif_tx_mib.h:34:38:    expected unsigned char [usertype] wakeup_period_max
> drivers/staging/wfx/hif_tx_mib.h:34:38:    got restricted __le16 [usertype]
> 
> is about
> static inline int hif_set_beacon_wakeup_period(struct wfx_vif *wvif,
>                                                unsigned int dtim_interval,
>                                                unsigned int listen_interval)
> {
>         struct hif_mib_beacon_wake_up_period val = {
>                 .wakeup_period_min = dtim_interval,
>                 .receive_dtim = 0,
>                 .wakeup_period_max = cpu_to_le16(listen_interval),
>         };
> and struct hif_mib_beacon_wake_up_period has wakeup_period_max declared
> as uint8_t.  We are shoving a le16 value into it.  Almost certain bug -
> that will result in the listen_interval % 256 on litte-endian host and
> listen_interval / 256 on big-endian one.  Looking at the callers to
> see what's actually passed as listen_interval shows only
>         hif_set_beacon_wakeup_period(wvif, wvif->dtim_period, wvif->dtim_period);
> and dtim_period in *wvif (struct wfx_vif) can be assigned 0, 1 or
> values coming from struct ieee80211_tim_ie ->dtim_period or
> struct ieee80211_bss_conf ->dtim_period, 8bit in either structure.
> 
> In other words, the value stored in val.wakeup_period_max will be
> always zero on big-endian hosts.  Definitely bogus, should just
> store that (8bit) value as-is; cpu_to_le16() is wrong here.

Absolutely agree.

> Next piece of fun:
> static inline int hif_beacon_filter_control(struct wfx_vif *wvif,
>                                             int enable, int beacon_count)
> {
>         struct hif_mib_bcn_filter_enable arg = {
>                 .enable = cpu_to_le32(enable),
>                 .bcn_count = cpu_to_le32(beacon_count),
>         };
>         return hif_write_mib(wvif->wdev, wvif->id,
>                              HIF_MIB_ID_BEACON_FILTER_ENABLE, &arg, sizeof(arg));
> }
> Sounds like ->enable and ->bcn_count should both be __le32, which makes
> sense since the structs passed to hardware appear to be fixed-endian on
> that thing.  However, annotating them as such adds warnigns:
> drivers/staging/wfx/sta.c:246:35: warning: incorrect type in assignment (different base types)
> drivers/staging/wfx/sta.c:246:35:    expected restricted __le32 [assigned] [usertype] bcn_count
> drivers/staging/wfx/sta.c:246:35:    got int
> drivers/staging/wfx/sta.c:249:32: warning: incorrect type in assignment (different base types)
> drivers/staging/wfx/sta.c:249:32:    expected restricted __le32 [assigned] [usertype] enable
> drivers/staging/wfx/sta.c:249:32:    got int
> drivers/staging/wfx/sta.c:253:32: warning: incorrect type in assignment (different base types)
> drivers/staging/wfx/sta.c:253:32:    expected restricted __le32 [assigned] [usertype] enable
> drivers/staging/wfx/sta.c:253:32:    got int
> drivers/staging/wfx/sta.c:262:62: warning: incorrect type in argument 2 (different base types)
> drivers/staging/wfx/sta.c:262:62:    expected int enable
> drivers/staging/wfx/sta.c:262:62:    got restricted __le32 [assigned] [usertype] enable
> drivers/staging/wfx/sta.c:262:78: warning: incorrect type in argument 3 (different base types)
> drivers/staging/wfx/sta.c:262:78:    expected int beacon_count
> drivers/staging/wfx/sta.c:262:78:    got restricted __le32 [assigned] [usertype] bcn_count
> 
> All in the same function (wfx_update_filtering()) and we really do store
> host-endian values in those (first 3 places).  In the last one we pass
> them to hif_beacon_filter_control(), which does expect host-endian.
> And that's the only thing we do to the instance of hif_mib_bcn_filter_enable
> in there...
> 
> Possible solutions:
>         1) store them little-endian there, pass to hif_beacon_filter_control()
> already l-e, get rid of cpu_to_le32() in the latter.
>         2) store them little-endian, pass the entire pointer to struct
> instead of forming it again in hif_beacon_filter_control()
>         3) don't pretend that the objects in hif_beacon_filter_control()
> and in wfx_update_filtering() are of the same type (different layouts on
> big-endian) and replace the one in the caller with two local variables.
> My preference would be (3), as in
[...]
> but that's a matter of taste.

Yes, this is one of the difficult parts. I work on it (I opted for
solution 3).

> Next is bx.c warning about __le32; that's about num_tx_count being fed to cpu_to_le32().
> grepping for that thing results in
> drivers/staging/wfx/bh.c:106:                   release_count = le32_to_cpu(((struct hif_cnf_multi_transmit *)hif->body)->num_tx_confs);
> drivers/staging/wfx/hif_api_cmd.h:316:  uint32_t   num_tx_confs;
> drivers/staging/wfx/hif_rx.c:78:        int count = body->num_tx_confs;
> which is troubling - the first line (in rx_helper()) expects to find
> a little-endian value in that field, while the last (in hif_multi_tx_confirm())
> - a host-endian, with nothing in sight that might account for conversion
> from one to another.
> 
> Let's look at the call chains: hif_multi_tx_confirm() is called only as
> hif_handlers[...]->handler(), which happens in in wfx_handle_rx().
> The call is
>                                 hif_handlers[i].handler(wdev, hif, hif->body);
> and hif has come from
>         struct hif_msg *hif = (struct hif_msg *) skb->data;
> wfx_handle_rx() is called by the same rx_helper()...  skb is created by
> rx_helper() and apparently filled by the call
>         if (wfx_data_read(wdev, skb->data, alloc_len))
>                 goto err;
> right next to the allocation... and prior to the
>                    release_count = le32_to_cpu(((struct hif_cnf_multi_transmit *)hif->body)->num_tx_confs);
> where we expect little-endian, with nothing to modify the skb contents
> between that and the call of wfx_handle_rx().  hif in rx_helper() points
> to the same place - skb->data.  OK, we almost certainly have a bug here.
>
> That thing allocates a packet and fills it with incoming data.  Then
> it parses the damn thing, apparently treating the same field of the
> incoming as little-endian in one place and host-endian in another.
> In principle it's possible that the rest of the packet determines
> which one it is, but by the look of that code both places are
> hit if and only if hif->id is equal to HIF_CNF_ID_MULTI_TRANSMIT.
> It *can't* be correct on big-endian.  Not even theoretically.
> 
> And since it's over-the-wire data, I would expect it to be fixed-endian.
> That needs to be confirmed with the driver's authors and/or direct
> experiment on big-endian host, but I strongly suspect that the right
> fix is to have
>         int count = le32_to_cpu(body->num_tx_confs);
> in hif_multi_tx_confirm() (and num_tx_confs annotated as __le32).

Indeed, num_tx_confs is always a le32 value.

> HOWEVER, that opens another nasty can of worms.  We have
>         struct hif_cnf_tx *buf_loc = (struct hif_cnf_tx *) &body->tx_conf_payload;
> ...
>         for (i = 0; i < count; ++i) {
>                 wfx_tx_confirm_cb(wvif, buf_loc);
>                 buf_loc++;
>         }
> with count derived from the packet and body pointing into the packet.  And no
> visible checks that would make sure the loop won't run out of the data we'd
> actually got.
> 
> The check in rx_helper() verifies that hif->len matches the amount we'd
> received; the check for ->num_tx_confs in there doesn't look like what
> we'd needed (that would be offset of body.tx_conf_payload in packet +
> num_tx_confs * sizeof(struct hif_cnf_multi_transmit) compared to
> actual size).
> 
> So it smells like a remote buffer overrun, little-endian or not.
> And at that point I would start looking for driver original authors with
> some rather pointed questions about the validation of data and lack
> thereof.

There are not so much checks done on data retrieved from the hardware.
I think we can find other similar issues in the driver.

In this particular case, indeed, a little check on length of received
data could be a good idea.

> BTW, if incoming packets are fixed-endian, I would expect more bugs on
> big-endian hosts - wfx_tx_confirm_cb() does things like
>                 tx_info->status.tx_time =
>                 arg->media_delay - arg->tx_queue_delay;
> with media_delay and tx_queue_delay both being 32bit fields in the
> incoming packet.  So another question to the authors (or documentation,
> or direct experiments) is what endianness do various fields in the incoming
> data have.  We can try and guess, but...

Fortunately, answer is simple enough: everything from hardware is
little endian :).

Jules, do you want to take care of fixing theses issues (except the one
about wfx_update_filtering())?

-- 
Jérôme Pouiller

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel