Add streams results metric data. #57

dvalinrh · 2026-01-09T14:41:53Z

Description

Streams pcp is broken, we are not setting anything properly. This fixes that issue

Before/After Comparison

Before:
Seeing:
Logging results iteration_1 stream.36608k_iter_
Unexpected metric logged. Check for a Typo

  pmrep -p -a streams.0 openmetrics.workload.stream.36608k_iter_
  Invalid metric openmetrics.workload.stream.36608k_iter_ (PM_ERR_NAME Unknown metric name).

After
Above message not seen.
[root@ip-170-0-17-77 pcp_2026.01.09-12.59.58]# pmrep -p -a
streams_size_stream.36608k_opt_level_2_threads_4_sockets_1.0 openmetrics.workload
pmrep -p -a streams_size_stream.36608k_opt_level_2_threads_4_sockets_1.0 openmetrics.workload

      o.w.iteration  o.w.running  o.w.numthreads  o.w.runtime  o.w.throughput  o.w.latency  o.w.Copy  o.w.Scale  o.w.Add  o.w.Triad

12:49:56 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:57 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:58 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:59 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN

Fix details:

Added openmetric file for Add, Copy, Scale, Triad

Moved the array size and iteration loops so all iterations happens for a array size at once.

pcp file now contains information of interest: array size, opt level, number threads, array size, and number of sockets. We could probably push this all into one file, but made more sense to me to have one file for each of those.

Fix the following line (separate commit)
info=grep "${search_for}" ${file}* | tr -s " " | sed "s/ /:/g" | cut -d: -f 4
to be
info=grep -h "${search_for}:" ${file}* | tr -s " " | sed "s/ /:/g" | tr -s ':' | cut -d: -f 2

The -h eliminates the file name being the first field if we have multiple files working with.
The tr -s ':' ensures that we have only a single ':' not multiples together.

Clerical Stuff

This closes #56

Relates to JIRA: RPOPC-758

Test results
Command executed
/home/ec2-user/workloads/streams-wrapper-2.1/streams/streams_run --run_user ec2-user --home_parent /home --iterations 1 --tuned_setting tuned_none_sys_file_ --host_config "m5.xlarge" --sysname "m5.xlarge" --sys_type aws --iterations 5 - --use_pcp

csv file

Test general meta start

Test: streams

Results version: 1.0

Host: m5.xlarge

Sys environ: aws

Tuned: virtual-guest

OS: 5.14.0-611.5.1.el9_7.x86_64

Numa nodes: 1

CPU family: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

Number cpus: 4

Memory: 15899880kB

Test general meta end

Test meta data start

Optimization level: O2

kernel_rev --meta_output numa_nodes

number_cpus

Core(s)_per_socket

Model_name

streams_version_# 5.10

Test meta data end

1 Socket
Array sizes:36608k:73216k:146432k:292864k
Copy:21504:21629:21560:22514
Scale:24733:24831:24855:24882
Add:25572:25603:25648:25671
Triad:25444:25540:25570:25597

Test meta data start

Optimization level: O3

kernel_rev --meta_output numa_nodes

number_cpus

Core(s)_per_socket

Model_name

streams_version_# 5.10

Test meta data end

1 Socket
Array sizes:36608k:73216k:146432k:292864k
Copy:21704:21587:21553:22478
Scale:24928:24933:24948:24914
Add:25742:25502:25823:25426
Triad:25614:25470:25735:25381

partial pcp output

pmrep -p -a streams_size_stream.36608k_opt_level_2_threads_4_sockets_1.0 openmetrics.workload
12:49:56 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:57 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:58 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:59 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN

=========================
Streams run ouput.

streams_x_out.txt

github-actions · 2026-01-09T14:42:02Z

This relates to RPOPC-758

malucius-rh

LGTM

frival

I'm inclined to approve but I have a couple questions first, because some things aren't made readily clear (and github's mangling of the formatting doesn't help). Take a deep breath folks.

First, this looks like it results in one archive per socket count, with all optimizations and array sizes and iterations for each array size in the same archive, is that correct? That means for our "normal" 2 socket systems we'd get two archives per run. It looks like we're recording separate metrics for each of Copy/Add/Scale/Triad which makes me happy. Do we want to have one archive per socket count (or are my old eyes just giving me a challenge with alignment in the diff output)? I'm not asking if that's the easiest way to do it but whether that's ideal when we have to use this for analysis.

Second, I don't see a log of the run, I see some logged output that github has mangled beyond belief but that's not the same. Ideally we'd be showing the full run output (e.g. a bash -x ./streams_run) - we got so aggressive about swallowing output because of what it did to the logs and screen of a multi-system run that I think we're now missing very valuable output that could make things better. I'd like that so we can ensure there are no other weird errors or warnings that we're silently ignoring unintentionally.

dvalinrh · 2026-01-15T12:52:16Z

Test output is located in https://github.com/user-attachments/files/24529237/streams_O2_virtual-guest.txt (as in the submit). We have to make a decision, we can put the output directly in the pr or in an attachment. My take, small output in the pr, large amount in the attachment.

pcp archive name
streams_size_stream.16384k_opt_level_2_threads_16_sockets_1.0
So we have an archive for each size, Opt level and thread count. We could bury this all into one archive, or grouping, but it will make the pmrep output harder to read and locate things. No particular preference, just the way I went.

frival · 2026-01-15T13:58:16Z

Test output is located in https://github.com/user-attachments/files/24529237/streams_O2_virtual-guest.txt (as in the submit). We have to make a decision, we can put the output directly in the pr or in an attachment. My take, small output in the pr, large amount in the attachment.

That's not the output of the full wrapper, that's only part of the run.

pcp archive name
streams_size_stream.16384k_opt_level_2_threads_16_sockets_1.0
So we have an archive for each size, Opt level and thread count. We could bury this all into one archive, or grouping, but it will make the pmrep output harder to read and locate things. No particular preference, just the way I went.

I agree it will make the pmrep output harder to read, but doing it this way also makes comparing across sizes more complicated as we have to merge archives. This way is easier to parse the pmrep output, the other is easier to compare different sizes etc. Both have up sides and down sides, I just want to be sure we're intentional about what we're choosing here vs. taking the first available option.

This also is very confusing in the log, it's probably expected but it's darned confusing:

Logging results Copy 21577.4
Copy NaN
Logging results Scale 24666.2
Scale NaN
Logging results Add 25588.9
Add NaN
Logging results Triad 25439.0
Triad NaN

Finally, because I'm being pedantic this week, ideally for a test like this we'd also have logs from a multi-NUMA-node system and not just a 4 vCPU instance. We're requiring those be run according to the new rules, so including the logs from them would be helpful.

malucius-rh · 2026-01-15T14:39:12Z

pcp archive name
streams_size_stream.16384k_opt_level_2_threads_16_sockets_1.0
So we have an archive for each size, Opt level and thread count. We could bury this all into one archive, or grouping, but it will make the pmrep output harder to read and locate things. No particular preference, just the way I went.

I agree it will make the pmrep output harder to read, but doing it this way also makes comparing across sizes more complicated as we have to merge archives. This way is easier to parse the pmrep output, the other is easier to compare different sizes etc. Both have up sides and down sides, I just want to be sure we're intentional about what we're choosing here vs. taking the first available option.

If we ever get support for annotations in PCP where we can inject things like " here beginneth 16384k O2" into the archive the One Big Archive method will become easier to handle. Until then for cases like this where we effectively have sizes * optlevels separate tests which happen to be grouped for runtime convenience we may want to consider (even though I really don't like the approach) logging the "config knobs" as metrics before the individual runs, treating them as ugly annotations.

for string, and we filter out the line properly.

recorded at once.

dvalinrh added 3 commits January 9, 2026 07:37

Add open metrics file for streams.

c3d520c

Update run_stream to handle pcp properly.

515a7e9

Remove spaces in lines, should be tabs.

436af8f

dvalinrh requested review from frival and malucius-rh January 9, 2026 14:41

dvalinrh changed the title ~~Fix streams pcp~~ Add streams results metric data. Jan 13, 2026

malucius-rh previously approved these changes Jan 14, 2026

View reviewed changes

frival reviewed Jan 14, 2026

View reviewed changes

dvalinrh requested a review from frival January 15, 2026 12:52

Fix the line info=`grep "${search_for}... so we will find only the asked

013fbe6

for string, and we filter out the line properly.

dvalinrh dismissed malucius-rh’s stale review via 013fbe6 January 23, 2026 13:03

Update run_stream so the pcp metrics for the streams test are all

5f91d31

recorded at once.

dvalinrh requested a review from malucius-rh January 23, 2026 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add streams results metric data. #57

Add streams results metric data. #57

Uh oh!

dvalinrh commented Jan 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

malucius-rh left a comment

Uh oh!

frival left a comment

Uh oh!

dvalinrh commented Jan 15, 2026

Uh oh!

frival commented Jan 15, 2026

Uh oh!

malucius-rh commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add streams results metric data. #57

Are you sure you want to change the base?

Add streams results metric data. #57

Uh oh!

Conversation

dvalinrh commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Before/After Comparison

Clerical Stuff

Test results Command executed /home/ec2-user/workloads/streams-wrapper-2.1/streams/streams_run --run_user ec2-user --home_parent /home --iterations 1 --tuned_setting tuned_none_sys_file_ --host_config "m5.xlarge" --sysname "m5.xlarge" --sys_type aws --iterations 5 - --use_pcp

csv file

Test: streams

Results version: 1.0

Host: m5.xlarge

Sys environ: aws

Tuned: virtual-guest

OS: 5.14.0-611.5.1.el9_7.x86_64

Numa nodes: 1

CPU family: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

Number cpus: 4

Memory: 15899880kB

Test general meta end

Test meta data start

Optimization level: O2

kernel_rev --meta_output numa_nodes

number_cpus

Core(s)_per_socket

Model_name

streams_version_# 5.10

Test meta data end

Test meta data start

Optimization level: O3

kernel_rev --meta_output numa_nodes

number_cpus

Core(s)_per_socket

Model_name

streams_version_# 5.10

Test meta data end

1 Socket Array sizes:36608k:73216k:146432k:292864k Copy:21704:21587:21553:22478 Scale:24928:24933:24948:24914 Add:25742:25502:25823:25426 Triad:25614:25470:25735:25381

partial pcp output

========================= Streams run ouput.

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

malucius-rh left a comment

Choose a reason for hiding this comment

Uh oh!

frival left a comment

Choose a reason for hiding this comment

Uh oh!

dvalinrh commented Jan 15, 2026

Uh oh!

frival commented Jan 15, 2026

Uh oh!

malucius-rh commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dvalinrh commented Jan 9, 2026 •

edited

Loading

Test results
Command executed
/home/ec2-user/workloads/streams-wrapper-2.1/streams/streams_run --run_user ec2-user --home_parent /home --iterations 1 --tuned_setting tuned_none_sys_file_ --host_config "m5.xlarge" --sysname "m5.xlarge" --sys_type aws --iterations 5 - --use_pcp

1 Socket
Array sizes:36608k:73216k:146432k:292864k
Copy:21704:21587:21553:22478
Scale:24928:24933:24948:24914
Add:25742:25502:25823:25426
Triad:25614:25470:25735:25381

=========================
Streams run ouput.