Skip to content

Conversation

@dvalinrh
Copy link
Contributor

@dvalinrh dvalinrh commented Jan 9, 2026

Description

Streams pcp is broken, we are not setting anything properly. This fixes that issue

Before/After Comparison

Before:
Seeing:
Logging results iteration_1 stream.36608k_iter_
Unexpected metric logged. Check for a Typo

  pmrep -p -a streams.0 openmetrics.workload.stream.36608k_iter_
  Invalid metric openmetrics.workload.stream.36608k_iter_ (PM_ERR_NAME Unknown metric name).

After
Above message not seen.
[root@ip-170-0-17-77 pcp_2026.01.09-12.59.58]# pmrep -p -a
streams_size_stream.36608k_opt_level_2_threads_4_sockets_1.0 openmetrics.workload
pmrep -p -a streams_size_stream.36608k_opt_level_2_threads_4_sockets_1.0 openmetrics.workload

      o.w.iteration  o.w.running  o.w.numthreads  o.w.runtime  o.w.throughput  o.w.latency  o.w.Copy  o.w.Scale  o.w.Add  o.w.Triad

12:49:56 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:57 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:58 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:59 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN

Fix details:

Added openmetric file for Add, Copy, Scale, Triad

Moved the array size and iteration loops so all iterations happens for a array size at once.

pcp file now contains information of interest: array size, opt level, number threads, array size, and number of sockets. We could probably push this all into one file, but made more sense to me to have one file for each of those.

Fix the following line (separate commit)
info=grep "${search_for}" ${file}* | tr -s " " | sed "s/ /:/g" | cut -d: -f 4
to be
info=grep -h "${search_for}:" ${file}* | tr -s " " | sed "s/ /:/g" | tr -s ':' | cut -d: -f 2

The -h eliminates the file name being the first field if we have multiple files working with.
The tr -s ':' ensures that we have only a single ':' not multiples together.

Clerical Stuff

This closes #56

Relates to JIRA: RPOPC-758

Test results
Command executed
/home/ec2-user/workloads/streams-wrapper-2.1/streams/streams_run --run_user ec2-user --home_parent /home --iterations 1 --tuned_setting tuned_none_sys_file_ --host_config "m5.xlarge" --sysname "m5.xlarge" --sys_type aws --iterations 5 - --use_pcp

csv file

Test general meta start

Test: streams

Results version: 1.0

Host: m5.xlarge

Sys environ: aws

Tuned: virtual-guest

OS: 5.14.0-611.5.1.el9_7.x86_64

Numa nodes: 1

CPU family: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

Number cpus: 4

Memory: 15899880kB

Test general meta end

Test meta data start

Optimization level: O2

kernel_rev --meta_output numa_nodes

number_cpus

Core(s)_per_socket

Model_name

streams_version_# 5.10

Test meta data end

1 Socket
Array sizes:36608k:73216k:146432k:292864k
Copy:21504:21629:21560:22514
Scale:24733:24831:24855:24882
Add:25572:25603:25648:25671
Triad:25444:25540:25570:25597

Test meta data start

Optimization level: O3

kernel_rev --meta_output numa_nodes

number_cpus

Core(s)_per_socket

Model_name

streams_version_# 5.10

Test meta data end

1 Socket
Array sizes:36608k:73216k:146432k:292864k
Copy:21704:21587:21553:22478
Scale:24928:24933:24948:24914
Add:25742:25502:25823:25426
Triad:25614:25470:25735:25381

partial pcp output

pmrep -p -a streams_size_stream.36608k_opt_level_2_threads_4_sockets_1.0 openmetrics.workload
12:49:56 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:57 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:58 1.000 1.000 0.000 NaN NaN NaN 21504.40 24648.500 25414.1 25307.400
12:49:59 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN NaN

=========================
Streams run ouput.

streams_x_out.txt

@dvalinrh dvalinrh requested review from frival and malucius-rh January 9, 2026 14:41
@github-actions
Copy link

github-actions bot commented Jan 9, 2026

This relates to RPOPC-758

@dvalinrh dvalinrh changed the title Fix streams pcp Add streams results metric data. Jan 13, 2026
malucius-rh
malucius-rh previously approved these changes Jan 14, 2026
Copy link

@malucius-rh malucius-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@frival frival left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to approve but I have a couple questions first, because some things aren't made readily clear (and github's mangling of the formatting doesn't help). Take a deep breath folks.

First, this looks like it results in one archive per socket count, with all optimizations and array sizes and iterations for each array size in the same archive, is that correct? That means for our "normal" 2 socket systems we'd get two archives per run. It looks like we're recording separate metrics for each of Copy/Add/Scale/Triad which makes me happy. Do we want to have one archive per socket count (or are my old eyes just giving me a challenge with alignment in the diff output)? I'm not asking if that's the easiest way to do it but whether that's ideal when we have to use this for analysis.

Second, I don't see a log of the run, I see some logged output that github has mangled beyond belief but that's not the same. Ideally we'd be showing the full run output (e.g. a bash -x ./streams_run) - we got so aggressive about swallowing output because of what it did to the logs and screen of a multi-system run that I think we're now missing very valuable output that could make things better. I'd like that so we can ensure there are no other weird errors or warnings that we're silently ignoring unintentionally.

@dvalinrh
Copy link
Contributor Author

Test output is located in https://github.com/user-attachments/files/24529237/streams_O2_virtual-guest.txt (as in the submit). We have to make a decision, we can put the output directly in the pr or in an attachment. My take, small output in the pr, large amount in the attachment.

pcp archive name
streams_size_stream.16384k_opt_level_2_threads_16_sockets_1.0
So we have an archive for each size, Opt level and thread count. We could bury this all into one archive, or grouping, but it will make the pmrep output harder to read and locate things. No particular preference, just the way I went.

@dvalinrh dvalinrh requested a review from frival January 15, 2026 12:52
@frival
Copy link
Contributor

frival commented Jan 15, 2026

Test output is located in https://github.com/user-attachments/files/24529237/streams_O2_virtual-guest.txt (as in the submit). We have to make a decision, we can put the output directly in the pr or in an attachment. My take, small output in the pr, large amount in the attachment.

That's not the output of the full wrapper, that's only part of the run.

pcp archive name
streams_size_stream.16384k_opt_level_2_threads_16_sockets_1.0
So we have an archive for each size, Opt level and thread count. We could bury this all into one archive, or grouping, but it will make the pmrep output harder to read and locate things. No particular preference, just the way I went.

I agree it will make the pmrep output harder to read, but doing it this way also makes comparing across sizes more complicated as we have to merge archives. This way is easier to parse the pmrep output, the other is easier to compare different sizes etc. Both have up sides and down sides, I just want to be sure we're intentional about what we're choosing here vs. taking the first available option.

This also is very confusing in the log, it's probably expected but it's darned confusing:

Logging results Copy 21577.4
Copy NaN
Logging results Scale 24666.2
Scale NaN
Logging results Add 25588.9
Add NaN
Logging results Triad 25439.0
Triad NaN

Finally, because I'm being pedantic this week, ideally for a test like this we'd also have logs from a multi-NUMA-node system and not just a 4 vCPU instance. We're requiring those be run according to the new rules, so including the logs from them would be helpful.

@malucius-rh
Copy link

pcp archive name
streams_size_stream.16384k_opt_level_2_threads_16_sockets_1.0
So we have an archive for each size, Opt level and thread count. We could bury this all into one archive, or grouping, but it will make the pmrep output harder to read and locate things. No particular preference, just the way I went.

I agree it will make the pmrep output harder to read, but doing it this way also makes comparing across sizes more complicated as we have to merge archives. This way is easier to parse the pmrep output, the other is easier to compare different sizes etc. Both have up sides and down sides, I just want to be sure we're intentional about what we're choosing here vs. taking the first available option.

If we ever get support for annotations in PCP where we can inject things like " here beginneth 16384k O2" into the archive the One Big Archive method will become easier to handle. Until then for cases like this where we effectively have sizes * optlevels separate tests which happen to be grouped for runtime convenience we may want to consider (even though I really don't like the approach) logging the "config knobs" as metrics before the individual runs, treating them as ugly annotations.

for string, and we filter out the line properly.
@dvalinrh dvalinrh requested a review from malucius-rh January 23, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Runs with PCP enabled are failing

4 participants