Skip to content

Conversation

@ArneTR
Copy link
Collaborator

@ArneTR ArneTR commented Sep 9, 2025

This PR re-works the current model implementation of the OLS model to estimate the weights.

The implementation is done in Python with the statsmodels library to make the formulas used better readable than with sklearn's OLS implementation.

OLS was used as model of choice as it has the highest interpretability allowing direct conclusions about component energy factors.

Also this PR adds sample data from my machine (Framebook) to support the project development with some sample data.

Suggested TODOs for further exploration

  • Perform validations of the OLS assumptions
  • Incorporate ridge L2 model to test
  • Consider alternative models, e.g., logarithmic models, which correlate more closely with the energy consumption of a CPU
  • Generally, record a power curve of the system and see if it is linear, logarithmic, or whatever ... draw conclusions from this
  • Generally validate how good the mixed model is for repetitions. Currently only done once

@ArneTR
Copy link
Collaborator Author

ArneTR commented Sep 9, 2025

Also I want to provide some experience while playing around with some models:

Workload design experiences

Computer systems behave very differently in power draw when different workloads happen. Thus it was explored if separate and disparate workloads shall be used or one large mixed workload shall be used.

Advantages of the separate workloads could be:

  • Better fit and better prediction
  • Might work especially well in edge cases
  • Might needs less sample data to fit

Disadvantages though respectively:

  • When conditions change the fit might be unusable. This was tried with a fit on a compute workload. Here the intercept was already 16 W (although system was 4 Watts in idle). Thus it was impossible to know for the model what would happen if CPU was idle.
  • The later prediction must know which model to choose when. This is hard when no domain knowledge is present as for instance a rule like "switch to compute when instructions are > 10,000,000" might be misleading in a system with a higher base frequency or sytems that only have one core and cannot fully sleep the core.

Model design experiences

  • Fitting on all variables leads to high colinearity. This means that repeated runs produce highly different weights. This can be either combated by higher sample sizes, changing the sample interval or by simply dropping variables or chaining them together.

    • Needs to be explored further
  • Idle workloads are dominated by wakeups. If a simple model with only wakeups is fitted the idle is properly estimated at around 1 W for the intercept and about 2-3 W contribution by the wakeups.

@ArneTR
Copy link
Collaborator Author

ArneTR commented Sep 9, 2025

@ribalba ping for you

@ArneTR
Copy link
Collaborator Author

ArneTR commented Sep 17, 2025

I just upgrade the PR and added some more transformations.

Try at you box and tell me the values.

Here is what I do:

sudo pkill -f energy-logger.sh
sudo ./energy-logger.sh & # will output the file it will write to
python3 run-workload.py mixed
sudo pkill -f energy-logger.sh

# now you will have output to stdout of a file name like /tmp/energy-0yb45hR0.log

# now you run the exact same stuff again

sudo pkill -f energy-logger.sh
sudo ./energy-logger.sh & # will output the file it will write to
python3 run-workload.py mixed
sudo pkill -f energy-logger.sh

# now you will have a different output to stdout of a file name like /tmp/energy-890342da.log

# Then you can run:

python3 model.py /tmp/energy-0yb45hR0.log --no-validate --fit OLS --predict /tmp/energy-890342da.log

This will effectively fit a model on the first benchmark run and then make an out of sample prediction with the second run.

Also please run the last command again with --log appended

Eager for the results :)

@ArneTR
Copy link
Collaborator Author

ArneTR commented Sep 17, 2025

@ribalba

@ArneTR
Copy link
Collaborator Author

ArneTR commented Oct 27, 2025

@ribalba I added a prediction stage that now also back-transforms the data from the logarithm space.

It can now be iterated quite quickly iterated on. I added examples for using the model.py and the predict stage of it with included energy sample data from the newly added endpoint in /sys/kernel/debug/energy/sys.

This can be merged now if you feel the functionality is contributing to the tool.

Happy for your feedback.

TODOs (TBD in a different PR though):

  • Automatic validation of model assumptions (condition number, F-Statistic, normal distribution of errors etc.)
    • Statsmodels calculates all these numbers already and outputs them in a statistic. They currently must be interpreted by the reader but conditions could be introduced to make this automatic
  • Implementing more and better models
    • The model chokes on numerical errors. Log transformation helps quite a bit, but sacrifices explainability and needs a log transformation in the kernel.
    • Thus other models and other transformations should be explored to make the model more viable
    • Alternatively models should be situational. A compute model for when compute is happening and a idle model when the system is idle.
  • Reducing overhead
    • On a 99ms sampling the kernel module requires 10% of one core. That is a negligable load on the 10 core system I am testing on, but when added with a sampling from userspace of 10 Hz the 5% margin of system load (which this PR assumes as not idle anymore) is quickly reached

@ArneTR ArneTR marked this pull request as ready for review October 27, 2025 06:50
@ArneTR
Copy link
Collaborator Author

ArneTR commented Jan 10, 2026

I just pushed another big chunk of implementations to the branch:

Models

  • Implemented XGBoost Model
  • Implemented Ridge Model
  • Implemented Huber Regressor

Data Preparation

  • Implemeneted Standard Scaler

Results

python3 model.py ../sample_data/energy_logs/mixed_workload.log --dump-raw --dump-diff --predict ../sample_data/energy_logs/mixed_workload_2.log --add-intercept --features extra --model xgboost --dump-predictions --no-validate
=> 82% Accuracy

python3 model.py ../sample_data/energy_logs/mixed_workload.log --dump-raw --dump-diff --predict ../sample_data/energy_logs/mixed_workload_2.log --add-intercept --features normal --model ols --dump-predictions --no-validate
=> 66% Accuracy

Summary

  • Huber and Ridge provide no benefit over a scaled OLS
  • XGBoost far outperforms all models on mixed data. However data becomes uninterpretable.
    • If we want to use that in the model output we need to put a user space conversion step after the kernel output

Possible next steps

  • Train separate OLS models for idle, compute etc. and see if we can get OLS to > 80% accuracy
  • Massage the dataset even more, removing outliers, apply different scaling strategies etc.

Call me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants