Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 7 additions & 17 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ on:
push:
pull_request:
branches: [ "main" ]
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

env:
parallel_processes: 8 # A good default counts is: available Threads + 4
Expand All @@ -29,38 +31,26 @@ jobs:
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: |
cmake -S ${{github.workspace}}/submissions/submission_25_05_01 -B ${{github.workspace}}/build/submission_25_05_01 -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
cmake -S ${{github.workspace}}/submissions/submission_25_05_08 -B ${{github.workspace}}/build/submission_25_05_08 -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
cmake -S ${{github.workspace}}/submissions/submission_25_05_15 -B ${{github.workspace}}/build/submission_25_05_15 -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
cmake -S ${{github.workspace}}/submissions/submission_25_05_22 -B ${{github.workspace}}/build/submission_25_05_22 -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
cmake -S ${{github.workspace}}/submissions/neon -B ${{github.workspace}}/build/neon -DCMAKE_BUILD_TYPE=${{matrix.build_type}}
cmake -S ${{github.workspace}} -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{matrix.build_type}}

- name: Build
# Build your program with the given configuration
run: |
cmake --build ${{github.workspace}}/build/submission_25_05_01 --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
cmake --build ${{github.workspace}}/build/submission_25_05_08 --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
cmake --build ${{github.workspace}}/build/submission_25_05_15 --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
cmake --build ${{github.workspace}}/build/submission_25_05_22 --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
cmake --build ${{github.workspace}}/build/neon --config ${{matrix.build_type}} -j ${{env.parallel_processes}}
cmake --build ${{github.workspace}}/build --config ${{matrix.build_type}} -j ${{env.parallel_processes}}

- name: Test
working-directory: ${{github.workspace}}/build
# Execute tests defined by the CMake configuration.
run: |
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir submission_25_05_01 --output-on-failure
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir submission_25_05_08 --output-on-failure
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir submission_25_05_15 --output-on-failure
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir submission_25_05_22 --output-on-failure
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --output-on-failure -E "^Test einsum tree optimize and execute first example"
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --test-dir neon --output-on-failure
ctest -j ${{env.parallel_processes}} -C ${{matrix.build_type}} --output-on-failure

- name: Test + Valgrind
working-directory: ${{github.workspace}}/build
# Execute tests defined by the CMake configuration.
run: |
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir submission_25_05_01 --output-on-failure
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir submission_25_05_08 --output-on-failure
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir submission_25_05_15 --output-on-failure
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir submission_25_05_22 --output-on-failure
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --test-dir neon --output-on-failure
ctest -j ${{env.parallel_processes}} -T memcheck -C ${{matrix.build_type}} --output-on-failure -E "^Test *(gemm generation|unary|tensor operation|parallel tensor operation|einsum tree execute|einsum tree optimize and execute)"

5 changes: 5 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,17 @@
"Fastor",
"fmax",
"fmla",
"GFLOPS",
"heapbytes",
"jited",
"linalg",
"madd",
"matmul",
"MATMUL",
"MATMULS",
"microbenchmark",
"Microbenchmark",
"microbenchmarks",
"microkernel",
"MINIJIT",
"movz",
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ message(STATUS "Build Type: ${CMAKE_BUILD_TYPE}")
# =============================================================
# Extra build options
# =============================================================
option(SAVE_JITS_TO_FILE "Saves the jitted kernels into a file if activated." OFF)
option(SAVE_JITS_TO_FILE "Saves the JITed kernels into a file if activated." OFF)


if(SAVE_JITS_TO_FILE)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs_sphinx/chapters/assembly.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Assembly
Before we begin implementing the individual components of the project, we will start with a brief review of assembly language.
This short chapter is intended as a refresher on the basic knowledge required for the project.

All files related to the tasks of this chapter can be found under ``submissions/assembly/``.

Hello Assembly
--------------

Expand Down
13 changes: 6 additions & 7 deletions docs_sphinx/chapters/base.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ Base

In this chapter, we get more familiar with some base ARM64 assembly instructions and how to benchmark the performance of such instructions.

All files related to the tasks of this chapter can be found under ``submissions/base/``.

Copying Data
------------

First, we will implement the functionality of the given ``copy_c_0`` and ``copy_c_1`` C functions from the ``copy_c.c`` file using only base instructions.
The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy_asm_1`` functions, located in the ``copy_asm.s`` file under
``submissions/submission_25_04_24/copy_asm.s``.
The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy_asm_1`` functions, located in the ``copy_asm.s`` file.

1. copy_asm_0
^^^^^^^^^^^^^
Expand Down Expand Up @@ -53,7 +54,7 @@ The corresponding assembly code will be written in the ``copy_asm_0`` and ``copy
cmp x3, x0 // compare value in x3 and x0
b.ge end_loop // conditions: counter x3 greater equal n/x0 (value in [x0])

ldr w4, [x1, x3, lsl #2] // adress = x1 + (x3 << 2)
ldr w4, [x1, x3, lsl #2] // address = x1 + (x3 << 2)
str w4, [x2, x3, lsl #2] // x3 << 2 = x3 * 4

add x3, x3, #1
Expand All @@ -79,9 +80,7 @@ Instruction Throughput and Latency

The next task is to benchmark the execution throughput and latency of the ``ADD`` (shifted register) and ``MUL`` instructions.

Our implementation is located under the directory ``submissions/submission_25_05_24/``.

Files: ``submissions/submission_25_05_24/``
Files:
- ``benchmark_driver.cpp``
- ``benchmark.s``

Expand Down Expand Up @@ -151,7 +150,7 @@ throughput and latency. For the throughput measurement of ``ADD`` this looks lik
ret
.size throughput_add, (. - throughput_add)

Throughput measurement of ``MUL`` is similar. For the latency benchmakring we use read-after-write dependencies to measure the latency of the instructions.
Throughput measurement of ``MUL`` is similar. For the latency benchmarking we use read-after-write dependencies to measure the latency of the instructions.
For ``ADD`` this looks like this:

.. code-block:: asm
Expand Down
Loading