Skip to content

kernel: k_thread_time_slice_set() has race conditions #100975

@peter-mitsis

Description

@peter-mitsis

Describe the bug

The routine k_thread_time_slice_set() has a race condition.

This race condition exists in the window of time between which the time slice has been marked as expired in the time slice timeout handler slice_timeout() and the time at which the thread specific callout is invoked.

The typical time slice timeout flow looks like this ...

  1. Ticks and associated timeouts are processed in sys_clock_announce().
  2. Their associated handlers are invoked with the timeout spinlock unlocked.
  3. When all the timeout processing is done, call z_time_slice() with the timeout spinlock unlocked
  4. z_time_slice() may call the thread specific timeout handler if the time slice timeout for the current CPU expired.

In an SMP system, it is very similar except that ..
A. The time slice handler triggers an IPI if the expired time slice timeout belonged to another CPU.
B. The IPI handler for the targeted CPU(s) call z_time_slice() to handle the expired time slice.

The Scenario.
z_time_slice() calls curr->base.slice_expired(curr, curr->base.slice_data) with interrupts unlocked which means that an ISR could call k_thread_time_slice_set() and change slice_expired to NULL leading to an exception as we call a NULL function pointer. This may appear to be easily rectifiable by caching the function pointer before we unlock interrupts. Although this should be done, there is more to it than that.

A deeper look into the scenario
Caching the function pointer is not enough. The time slice timeout handler is invoked from sys_clock_announce() along with an unpredictable number of other timeout handlers (with the timeout spinlock unlocked). This opens the doors for other timeout handlers to call k_thread_time_slice_set() AFTER the timeout has occurred leading to either no thread specific time slice handler being called, or one that was specified after the timeout event occurred.

When we add SMP into it, the surface attack area expands to include threads instead of just handlers from timeouts or other ISRs.

To correct this, I think that we should put a restriction on k_thread_time_slice_set() such that it fails and return an error code such as -EBUSY if the thread is the current thread on any CPU.

Regression

  • This is a regression.

Steps to reproduce

No response

Relevant log output

Impact

Annoyance – Minor irritation; no significant impact on usability or functionality.

Environment

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

area: KernelbugThe issue is a bug, or the PR is fixing a bug

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions