-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Description
Describe the bug
The routine k_thread_time_slice_set() has a race condition.
This race condition exists in the window of time between which the time slice has been marked as expired in the time slice timeout handler slice_timeout() and the time at which the thread specific callout is invoked.
The typical time slice timeout flow looks like this ...
- Ticks and associated timeouts are processed in sys_clock_announce().
- Their associated handlers are invoked with the timeout spinlock unlocked.
- When all the timeout processing is done, call z_time_slice() with the timeout spinlock unlocked
- z_time_slice() may call the thread specific timeout handler if the time slice timeout for the current CPU expired.
In an SMP system, it is very similar except that ..
A. The time slice handler triggers an IPI if the expired time slice timeout belonged to another CPU.
B. The IPI handler for the targeted CPU(s) call z_time_slice() to handle the expired time slice.
The Scenario.
z_time_slice() calls curr->base.slice_expired(curr, curr->base.slice_data) with interrupts unlocked which means that an ISR could call k_thread_time_slice_set() and change slice_expired to NULL leading to an exception as we call a NULL function pointer. This may appear to be easily rectifiable by caching the function pointer before we unlock interrupts. Although this should be done, there is more to it than that.
A deeper look into the scenario
Caching the function pointer is not enough. The time slice timeout handler is invoked from sys_clock_announce() along with an unpredictable number of other timeout handlers (with the timeout spinlock unlocked). This opens the doors for other timeout handlers to call k_thread_time_slice_set() AFTER the timeout has occurred leading to either no thread specific time slice handler being called, or one that was specified after the timeout event occurred.
When we add SMP into it, the surface attack area expands to include threads instead of just handlers from timeouts or other ISRs.
To correct this, I think that we should put a restriction on k_thread_time_slice_set() such that it fails and return an error code such as -EBUSY if the thread is the current thread on any CPU.
Regression
- This is a regression.
Steps to reproduce
No response
Relevant log output
Impact
Annoyance – Minor irritation; no significant impact on usability or functionality.
Environment
No response
Additional Context
No response