MEDIUM: task: add a new tasklet class for real-time: TL_RT
This adds new class TL_RT, which is processed before other queues for
one (and only one) tasklet featuring the TASK_RT flag. This is meant to
process real time wakeups under load with even less latency. We only
process one entry to make sure it will not be abused for unimportant
stuff, and if tune.sched.low-latency is set, we also avoid picking more
tasks from the current run queues and looping after the first call to
run_tasks_from_list().
Measurements under a load of 10k concurrent conns injection at 10 Gbps
(~58k 20kB objects/s) on 4 threads and with task profiling enabled shows
that the average wakeup latency for wakeups every 10ms dropped from 220
microseconds to 1.8 microsecond, and even ~550 nanoseconds when
tune.sched.low-latency is set, or 400 times less.