On Mon Nov 23 2020, Vladimir Oltean wrote: > Hi Kurt, > > On Sat, Nov 21, 2020 at 12:57:03PM +0100, Kurt Kanzenbach wrote: >> The switch has support for the 802.1Qbv Time Aware Shaper (TAS). Traffic >> schedules may be configured individually on each front port. Each port has eight >> egress queues. The traffic is mapped to a traffic class respectively via the PCP >> field of a VLAN tagged frame. >> >> The TAPRIO Qdisc already implements that. Therefore, this interface can simply >> be reused. Add .port_setup_tc() accordingly. >> >> The activation of a schedule on a port is split into two parts: >> >> * Programming the necessary gate control list (GCL) >> * Setup delayed work for starting the schedule >> >> The hardware supports starting a schedule up to eight seconds in the future. The >> TAPRIO interface provides an absolute base time. Therefore, periodic delayed >> work is leveraged to check whether a schedule may be started or not. >> >> Signed-off-by: Kurt Kanzenbach >> --- >> drivers/net/dsa/hirschmann/hellcreek.c | 314 +++++++++++++++++++++++++ >> drivers/net/dsa/hirschmann/hellcreek.h | 22 ++ >> 2 files changed, 336 insertions(+) >> >> diff --git a/drivers/net/dsa/hirschmann/hellcreek.c b/drivers/net/dsa/hirschmann/hellcreek.c >> index 6420b76ea37c..35514af1922a 100644 >> --- a/drivers/net/dsa/hirschmann/hellcreek.c >> +++ b/drivers/net/dsa/hirschmann/hellcreek.c >> @@ -23,6 +23,7 @@ >> #include >> #include >> #include >> +#include >> >> #include "hellcreek.h" >> #include "hellcreek_ptp.h" >> @@ -153,6 +154,13 @@ static void hellcreek_select_vlan(struct hellcreek *hellcreek, int vid, >> hellcreek_write(hellcreek, val, HR_VIDCFG); >> } >> >> +static void hellcreek_select_tgd(struct hellcreek *hellcreek, int port) >> +{ >> + u16 val = port << TR_TGDSEL_TDGSEL_SHIFT; >> + >> + hellcreek_write(hellcreek, val, TR_TGDSEL); >> +} >> + >> static int hellcreek_wait_until_ready(struct hellcreek *hellcreek) >> { >> u16 val; >> @@ -1135,6 +1143,308 @@ hellcreek_port_prechangeupper(struct dsa_switch *ds, int port, >> return ret; >> } >> >> +static void hellcreek_setup_gcl(struct hellcreek *hellcreek, int port, >> + const struct hellcreek_schedule *schedule) >> +{ >> + size_t i; >> + >> + for (i = 1; i <= schedule->num_entries; ++i) { >> + const struct hellcreek_gcl_entry *cur, *initial, *next; >> + u16 data; >> + u8 gates; >> + >> + cur = &schedule->entries[i - 1]; >> + initial = &schedule->entries[0]; >> + next = &schedule->entries[i]; >> + > > I would find it more intuitive to have the invariant assignment out of > the loop. > > const struct hellcreek_gcl_entry *cur, *initial, *next; > > cur = initial = &schedule->entries[0]; > next = cur + 1; > > for (i = 0; i < schedule->num_entries; i++) { > u16 data; > u8 gates; > > [...] > > cur++; > next++; > } Sure. Could also do for (i = 0; i < schedule->num_entries; i++, cur++, next++) > >> + if (i == schedule->num_entries) >> + gates = initial->gate_states ^ >> + cur->gate_states; >> + else >> + gates = next->gate_states ^ >> + cur->gate_states; >> + >> + data = gates; >> + if (cur->overrun_ignore) >> + data |= TR_GCLDAT_GCLOVRI; >> + >> + if (i == schedule->num_entries) >> + data |= TR_GCLDAT_GCLWRLAST; >> + >> + /* Gates states */ >> + hellcreek_write(hellcreek, data, TR_GCLDAT); >> + >> + /* Time intervall */ > > interval Ah. That's German spelling. > >> + hellcreek_write(hellcreek, >> + cur->interval & 0x0000ffff, >> + TR_GCLTIL); >> + hellcreek_write(hellcreek, >> + (cur->interval & 0xffff0000) >> 16, >> + TR_GCLTIH); >> + >> + /* Commit entry */ >> + data = ((i - 1) << TR_GCLCMD_GCLWRADR_SHIFT) | >> + (initial->gate_states << >> + TR_GCLCMD_INIT_GATE_STATES_SHIFT); >> + hellcreek_write(hellcreek, data, TR_GCLCMD); > > So the command register holds the initial gate states. When the command > register is written, it samples the data register, populating GCL entry > number GCLWRADR with that information. Every GCL entry contains the > duration until it is executed, and the gates that need to change when > the schedule transitions towards the next GCL entry. Right? > > Why are the initial gate states written each time into the command > register? Is it required by the hardware, or just easier in the > implementation? I think it's required. That sequence is from the programming guide. > >> + } >> +} >> + >> +static void hellcreek_set_cycle_time(struct hellcreek *hellcreek, >> + const struct hellcreek_schedule *schedule) >> +{ >> + u32 cycle_time = schedule->cycle_time; >> + >> + hellcreek_write(hellcreek, cycle_time & 0x0000ffff, TR_CTWRL); >> + hellcreek_write(hellcreek, (cycle_time & 0xffff0000) >> 16, TR_CTWRH); > > The cycle_time in struct tc_taprio_qopt_offload is 64-bit, so you should > NACK a value that is too large and return an appropriate extack > message. Ok, makes sense. > >> +} >> + >> +static void hellcreek_switch_schedule(struct hellcreek *hellcreek, >> + ktime_t start_time) >> +{ >> + struct timespec64 ts = ktime_to_timespec64(start_time); >> + >> + /* Start can be up to eight seconds in the future */ >> + ts.tv_sec %= 8; > > What happens if base_time is specified as zero, or a small number that > only encodes a phase? > > I would expect that the base time is advanced by an integer number of > cycles until it becomes in the immediate future, so that it can be > applied. Yes, that's missing/not implemented. If the admin base time is in the past, a valid starting point in the future has to be calculated by the driver. > > I don't think that's what happens right now. > If the start_time is 0.000000000, the cycle_time is 0.333333333, and now > is 8.125000000. > > I believe that what would happen is: > - hellcreek_schedule_startable() says "yes, it's startable right away", > because base_time_ns - current_ns) is negative, and therefore also > smaller than 8 * NSEC_PER_SEC. > - hellcreek_switch_schedule() gets called with the 0.000000000 time, and > this start_time gets programmed right away into hardware. > > What does the hardware do if it's given a schedule that begins at 0.000000000 > when the current time is at 8.125000000? > > I would expect that somebody (hardware or driver) advances the time > 0.000000000 into the first time that is larger than 8.125000000, while > still remaining congruent to the original time in terms of phase offset. > > I.e. advance the base time by 25 cycle times, to get > 0.000000000 + 25 x 0.333333333 = 8.333333325 ns. > > Then I would expect the schedule to start at 8.333333325 nanoseconds, > which is the first value immediately larger than "now" (8.125000000). Correct. > > When you give the hardware the base_time of 0.000000000, is this what > happens? Is it equivalent to giving it 0.333333325? > >> + >> + /* Start schedule at this point of time */ >> + hellcreek_write(hellcreek, ts.tv_nsec & 0x0000ffff, TR_ESTWRL); >> + hellcreek_write(hellcreek, (ts.tv_nsec & 0xffff0000) >> 16, TR_ESTWRH); >> + >> + /* Arm timer, set seconds and switch schedule */ >> + hellcreek_write(hellcreek, TR_ESTCMD_ESTARM | TR_ESTCMD_ESTSWCFG | >> + ((ts.tv_sec & TR_ESTCMD_ESTSEC_MASK) << >> + TR_ESTCMD_ESTSEC_SHIFT), TR_ESTCMD); >> +} >> + >> +static struct hellcreek_schedule * >> +hellcreek_taprio_to_schedule(struct tc_taprio_qopt_offload *taprio) >> +{ >> + struct hellcreek_schedule *schedule; >> + size_t i; >> + >> + /* Allocate some memory first */ >> + schedule = kzalloc(sizeof(*schedule), GFP_KERNEL); > > struct hellcreek_schedule has no added value on top of struct > tc_taprio_qopt_offload (except for _maybe_ that overrun_ignore, which > currently you don't use and could therefore remove - arguably the > overrun_ignore flag should be user-visible if it ever was to be > configured, and therefore my argument still stands). overrun_ignore shouldn't be used. So, yes, that schedule can be removed. > > The reason why I'm making the point, though, is that you don't need to > allocate extra memory if you use the plain struct tc_taprio_qopt_offload. > You can use taprio_offload_get() to increase the reference count on the > structure that was passed to you, use it for as long as you need, and > just call taprio_offload_free() when you're done with it. Ah, didn't know that. That'll help. > >> + if (!schedule) >> + return ERR_PTR(-ENOMEM); >> + schedule->entries = kcalloc(taprio->num_entries, >> + sizeof(*schedule->entries), >> + GFP_KERNEL); >> + if (!schedule->entries) { >> + kfree(schedule); >> + return ERR_PTR(-ENOMEM); >> + } >> + >> + /* Construct hellcreek schedule */ >> + schedule->num_entries = taprio->num_entries; >> + schedule->base_time = taprio->base_time; >> + >> + for (i = 0; i < taprio->num_entries; ++i) { >> + const struct tc_taprio_sched_entry *t = &taprio->entries[i]; >> + struct hellcreek_gcl_entry *k = &schedule->entries[i]; >> + >> + k->interval = t->interval; >> + k->gate_states = t->gate_mask; >> + k->overrun_ignore = 0; >> + >> + /* Update complete cycle time */ >> + schedule->cycle_time += t->interval; > > You shouldn't need to do this, the cycle_time from struct > tc_taprio_qopt_offload should come properly populated. If it doesn't > it's a bug. True. > >> + } >> + >> + return schedule; >> +} >> + >> +static void hellcreek_free_schedule(struct hellcreek *hellcreek, int port) >> +{ >> + struct hellcreek_port *hellcreek_port = &hellcreek->ports[port]; >> + >> + kfree(hellcreek_port->current_schedule->entries); >> + kfree(hellcreek_port->current_schedule); >> + hellcreek_port->current_schedule = NULL; >> +} >> + >> +static bool hellcreek_schedule_startable(struct hellcreek *hellcreek, int port) >> +{ >> + struct hellcreek_port *hellcreek_port = &hellcreek->ports[port]; >> + s64 base_time_ns, current_ns; >> + >> + /* The switch allows a schedule to be started only eight seconds within >> + * the future. Therefore, check the current PTP time if the schedule is >> + * startable or not. >> + */ > > The future of whom? I expect that TR_ESTWR is relative to the most > recent integer multiple of 8 seconds, and not relative to "now". I think TR_ESTWR is relative to "now" and software has to make sure that is programmed correctly. > Otherwise you would never be able to phase-align taprio schedules with > other devices in the LAN. > > Doesn't this mean that you need to be extra careful at modulo-8-seconds > wraparounds of the current PTP time? Yes, indeed. > >> + >> + /* Use the "cached" time. That should be alright, as it's updated quite >> + * frequently in the PTP code. >> + */ >> + mutex_lock(&hellcreek->ptp_lock); >> + current_ns = hellcreek->seconds * NSEC_PER_SEC + hellcreek->last_ts; >> + mutex_unlock(&hellcreek->ptp_lock); >> + >> + /* Calculate difference to admin base time */ >> + base_time_ns = ktime_to_ns(hellcreek_port->current_schedule->base_time); >> + >> + if (base_time_ns - current_ns < (s64)8 * NSEC_PER_SEC) >> + return true; >> + >> + return false; >> +} >> + >> +static void hellcreek_start_schedule(struct hellcreek *hellcreek, int port) >> +{ >> + struct hellcreek_port *hellcreek_port = &hellcreek->ports[port]; >> + >> + /* First select port */ >> + hellcreek_select_tgd(hellcreek, port); >> + >> + /* Set admin base time and switch schedule */ >> + hellcreek_switch_schedule(hellcreek, >> + hellcreek_port->current_schedule->base_time); >> + >> + hellcreek_free_schedule(hellcreek, port); >> + >> + dev_dbg(hellcreek->dev, "ARMed EST timer for port %d\n", > > Is ARM used as an acronym for something here? Why not Armed? Yeah, it should be "Armed". > >> + hellcreek_port->port); >> +} >> + >> +static void hellcreek_check_schedule(struct work_struct *work) >> +{ >> + struct delayed_work *dw = to_delayed_work(work); >> + struct hellcreek_port *hellcreek_port; >> + struct hellcreek *hellcreek; >> + bool startable; >> + >> + hellcreek_port = dw_to_hellcreek_port(dw); >> + hellcreek = hellcreek_port->hellcreek; >> + >> + mutex_lock(&hellcreek->reg_lock); >> + >> + /* Check starting time */ >> + startable = hellcreek_schedule_startable(hellcreek, >> + hellcreek_port->port); >> + if (startable) { >> + hellcreek_start_schedule(hellcreek, hellcreek_port->port); >> + mutex_unlock(&hellcreek->reg_lock); >> + return; >> + } >> + >> + mutex_unlock(&hellcreek->reg_lock); >> + >> + /* Reschedule */ >> + schedule_delayed_work(&hellcreek_port->schedule_work, >> + HELLCREEK_SCHEDULE_PERIOD); >> +} >> + >> +static int hellcreek_port_set_schedule(struct dsa_switch *ds, int port, >> + struct tc_taprio_qopt_offload *taprio) >> +{ >> + struct hellcreek *hellcreek = ds->priv; >> + struct hellcreek_port *hellcreek_port; >> + struct hellcreek_schedule *schedule; >> + bool startable; >> + u16 ctrl; >> + >> + hellcreek_port = &hellcreek->ports[port]; >> + >> + /* Convert taprio data to hellcreek schedule */ >> + schedule = hellcreek_taprio_to_schedule(taprio); >> + if (IS_ERR(schedule)) >> + return PTR_ERR(schedule); >> + >> + dev_dbg(hellcreek->dev, "Configure traffic schedule on port %d\n", >> + port); >> + >> + /* First cancel delayed work */ >> + cancel_delayed_work_sync(&hellcreek_port->schedule_work); >> + >> + mutex_lock(&hellcreek->reg_lock); >> + >> + if (hellcreek_port->current_schedule) >> + hellcreek_free_schedule(hellcreek, port); >> + hellcreek_port->current_schedule = schedule; >> + >> + /* Then select port */ >> + hellcreek_select_tgd(hellcreek, port); >> + >> + /* Enable gating and set the admin state to forward everything in the >> + * mean time >> + */ >> + ctrl = (0xff << TR_TGDCTRL_ADMINGATESTATES_SHIFT) | TR_TGDCTRL_GATE_EN; >> + hellcreek_write(hellcreek, ctrl, TR_TGDCTRL); > > Hmm, "in the meantime"? When do these admin gate states become effective? > If possible, I expect that the currently operational schedule remains > operational exactly until the base_time of the newly installed one, > minus a possible cycle_time_extension. Yes, that's the expectation. And the hardware works like that. I'll have a look if this code is correct. > >> + >> + /* Cancel pending schedule */ >> + hellcreek_write(hellcreek, 0x00, TR_ESTCMD); >> + >> + /* Setup a new schedule */ >> + hellcreek_setup_gcl(hellcreek, port, schedule); >> + >> + /* Configure cycle time */ >> + hellcreek_set_cycle_time(hellcreek, schedule); > > As a general comment, if the hardware doesn't support the cycle time > extension when switching schedules, then you should NACK a non-zero > passed argument there. Ok. > >> + >> + /* Check starting time */ >> + startable = hellcreek_schedule_startable(hellcreek, port); >> + if (startable) { >> + hellcreek_start_schedule(hellcreek, port); >> + mutex_unlock(&hellcreek->reg_lock); >> + return 0; >> + } >> + >> + mutex_unlock(&hellcreek->reg_lock); >> + >> + /* Schedule periodic schedule check */ >> + schedule_delayed_work(&hellcreek_port->schedule_work, >> + HELLCREEK_SCHEDULE_PERIOD); >> + >> + return 0; >> +} >> + >> +static int hellcreek_port_del_schedule(struct dsa_switch *ds, int port) >> +{ >> + struct hellcreek *hellcreek = ds->priv; >> + struct hellcreek_port *hellcreek_port; >> + >> + hellcreek_port = &hellcreek->ports[port]; >> + >> + dev_dbg(hellcreek->dev, "Remove traffic schedule on port %d\n", port); >> + >> + /* First cancel delayed work */ >> + cancel_delayed_work_sync(&hellcreek_port->schedule_work); >> + >> + mutex_lock(&hellcreek->reg_lock); >> + >> + if (hellcreek_port->current_schedule) >> + hellcreek_free_schedule(hellcreek, port); >> + >> + /* Then select port */ >> + hellcreek_select_tgd(hellcreek, port); >> + >> + /* Disable gating and return to regular switching flow */ >> + hellcreek_write(hellcreek, 0xff << TR_TGDCTRL_ADMINGATESTATES_SHIFT, >> + TR_TGDCTRL); >> + >> + mutex_unlock(&hellcreek->reg_lock); >> + >> + return 0; >> +} >> + >> +static int hellcreek_port_setup_tc(struct dsa_switch *ds, int port, >> + enum tc_setup_type type, void *type_data) >> +{ >> + struct tc_taprio_qopt_offload *taprio = type_data; >> + struct hellcreek *hellcreek = ds->priv; >> + >> + if (type != TC_SETUP_QDISC_TAPRIO) >> + return -EOPNOTSUPP; >> + >> + /* Does this hellcreek version support Qbv in hardware? */ >> + if (!hellcreek->pdata->qbv_support) >> + return -EOPNOTSUPP; >> + >> + if (taprio->enable) >> + return hellcreek_port_set_schedule(ds, port, taprio); >> + >> + return hellcreek_port_del_schedule(ds, port); >> +} >> + >> static const struct dsa_switch_ops hellcreek_ds_ops = { >> .get_ethtool_stats = hellcreek_get_ethtool_stats, >> .get_sset_count = hellcreek_get_sset_count, >> @@ -1153,6 +1463,7 @@ static const struct dsa_switch_ops hellcreek_ds_ops = { >> .port_hwtstamp_get = hellcreek_port_hwtstamp_get, >> .port_prechangeupper = hellcreek_port_prechangeupper, >> .port_rxtstamp = hellcreek_port_rxtstamp, >> + .port_setup_tc = hellcreek_port_setup_tc, >> .port_stp_state_set = hellcreek_port_stp_state_set, >> .port_txtstamp = hellcreek_port_txtstamp, >> .port_vlan_add = hellcreek_vlan_add, >> @@ -1208,6 +1519,9 @@ static int hellcreek_probe(struct platform_device *pdev) >> >> port->hellcreek = hellcreek; >> port->port = i; >> + >> + INIT_DELAYED_WORK(&port->schedule_work, >> + hellcreek_check_schedule); >> } >> >> mutex_init(&hellcreek->reg_lock); >> diff --git a/drivers/net/dsa/hirschmann/hellcreek.h b/drivers/net/dsa/hirschmann/hellcreek.h >> index e81781ebc31c..7ffb1b33ff72 100644 >> --- a/drivers/net/dsa/hirschmann/hellcreek.h >> +++ b/drivers/net/dsa/hirschmann/hellcreek.h >> @@ -213,6 +213,20 @@ struct hellcreek_counter { >> const char *name; >> }; >> >> +struct hellcreek_gcl_entry { >> + u32 interval; >> + u8 gate_states; >> + bool overrun_ignore; >> +}; >> + >> +struct hellcreek_schedule { >> + struct hellcreek_gcl_entry *entries; >> + size_t num_entries; >> + ktime_t base_time; >> + u32 cycle_time; >> + int port; > > You don't need/use the "port" member. True. > >> +}; >> + >> struct hellcreek; >> >> /* State flags for hellcreek_port_hwtstamp::state */ >> @@ -246,6 +260,10 @@ struct hellcreek_port { >> >> /* Per-port timestamping resources */ >> struct hellcreek_port_hwtstamp port_hwtstamp; >> + >> + /* Per-port Qbv schedule information */ >> + struct hellcreek_schedule *current_schedule; >> + struct delayed_work schedule_work; >> }; >> >> struct hellcreek_fdb_entry { >> @@ -283,4 +301,8 @@ struct hellcreek { >> size_t fdb_entries; >> }; >> >> +#define HELLCREEK_SCHEDULE_PERIOD (2 * HZ) > > Is there a risk if you schedule rarer than this? The hellcreek->seconds > value is no longer accurate enough? The hw engineer recommended to use the two seconds. In theory, it can be increased. Thanks for the review. Thanks, Kurt