Fix possible invalid instruction in inlined memmove, optimized LDDR path #9

calc84maniac · 2025-09-18T19:10:20Z

Inlined memmove was generating a comparison pseudo-instruction using the input destination pointer register directly, which could end up being an invalid register such as IY, generating an sbc hl, iy instruction.

I moved the comparison into the memmove pseudo-instruction itself to ensure it always uses sbc hl, de, and also optimized the LDDR path to use the subtraction result to add BC-1 to DE prior to adding it back into HL.

I also managed to ensure the BC, DE, HL registers are carried directly into the LDIR/LDDR paths, though I'm not 100% sure if what I did was valid (I allocated virtual registers to pass them into the blocks, then replaced all usages of those virtual registers with the physical registers). LLVM didn't yell at me though, unlike with some other approaches I tried.

Example code output before this fix:

        ld      hl, (iy + 5)
        ld      bc, 5
        call    __imulu
        push    hl
        pop     de
        ld      hl, _vars
        add     hl, de
        ld      (ix - 14), hl
        lea     de, iy
        or      a, a
        sbc     hl, iy
        jr      c, BB2_5
        ld      hl, (ix - 14)
        ld      iy, 5
        lea     bc, iy
        ldir
        jr      BB2_6
BB2_5:
        ld      hl, (ix - 14)
        ld      iy, 5
        lea     bc, iy
        add     hl, bc
        dec     hl
        ex      de, hl
        add     hl, bc
        dec     hl
        ex      de, hl
        lddr
BB2_6:
        ld      de, (ix - 14)
        ld      hl, (ix - 11)
        lea     bc, iy
        ldir

And after:

        ld      hl, (iy + 5)
        ld      bc, 5
        call    __imulu
        push    hl
        pop     de
        ld      hl, _vars
        add     hl, de
        lea     de, iy
        push    hl
        pop     iy
        or      a, a
        sbc     hl, de
        jr      c, BB2_5
        add     hl, de
        ldir
        jr      BB2_6
BB2_5:
        ex      de, hl
        add     hl, bc
        dec     hl
        ex      de, hl
        add     hl, de
        lddr
BB2_6:
        lea     de, iy
        ld      hl, (ix - 11)
        ld      bc, 5
        ldir

(cherry picked from commit 3eca0b3)

…PR57692) Callee save registers must be preserved, so -fzero-call-used-regs should not be zeroing them. The previous implementation only did not zero callee save registers that were saved&restored inside the function, but we need preserve all of them. Fixes llvm#57692. Differential Revision: https://reviews.llvm.org/D133946 (cherry picked from commit b430980)

If libcxxabi is not included CMake will error out: Cannot find target libcxxabi-SHARED I ran into this doing the 15.0.0 release Differential Revision: https://reviews.llvm.org/D133475

…ockfree In https://llvm.org/D56913, we added an emulation for the __atomic_always_lock_free compiler builtin when compiling in Freestanding mode. However, the emulation did (and could not) give exactly the same answer as the compiler builtin, which led to a potential ABI break for e.g. enum classes. After speaking to the original author of D56913, we agree that the correct behavior is to instead always use the compiler builtin, since that provides a more accurate answer, and __atomic_always_lock_free is a purely front-end builtin which doesn't require any runtime support. Furthermore, it is available regardless of the Standard mode (see https://godbolt.org/z/cazf3ssYY). However, this patch does constitute an ABI break. As shown by https://godbolt.org/z/1eoex6zdK: - In LLVM <= 11.0.1, an atomic<enum class with 1 byte> would not contain a lock byte. - In LLVM >= 12.0.0, an atomic<enum class with 1 byte> would contain a lock byte. This patch breaks the ABI again to bring it back to 1 byte, which seems like the correct thing to do. Fixes llvm#57440 Differential Revision: https://reviews.llvm.org/D133377 (cherry picked from commit f1a601f)

Includes a test for the miscompile in llvm#57712.

The test requires the AArch64 backend, so move it to the right subdir.

Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before 151c144. This effectively reverts 151c144, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes llvm#57712.

(cherry picked from commit 59351fe)

…e release In LLVM 15, we added the deprecation markup for unary_function and binary_function for >= C++11, and we also removed it for >= C++17. While this is in accordance with the Standard, it's also a bit quick for our users, since there was no release in which the classes were marked as deprecated before their removal. We noticed widespread breakage due to this, and after months of trying to fix downstream failures, I am coming to the conclusion that users will be better served if we give them one release where unary_function is deprecated but still provided even in >= C++17. Differential Revision: https://reviews.llvm.org/D134473

This solves llvm#57664 Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D134019 (cherry picked from commit c941d92)

This avoids deprecation warning: ``` warning: definition of implicit copy assignment operator for 'AddrInfo' is deprecated because it has a user-declared copy constructor [-Wdeprecated-copy] ``` This fixes llvm#57229 (cherry picked from commit 252cea0)

The unittests are already included in check-polly, so check-all was running them twice. Running them twice causes a race on the output files, which led to intermittent failures on the reverse-iteration buildbot. (cherry picked from commit 2c29268)

For RVC, GNU assembler and LLVM integrated assembler add c.nop followed by a sequence of 4-byte nops. Even if remove % 4 == 0, we have to split one 4-byte nop and therefore need to write the code sequence, otherwise we create an incorrect c.unimp. (cherry picked from commit 78084d9)

…mplate Differential Revision: https://reviews.llvm.org/D132830 (cherry picked from commit 898c421)

(cherry picked from commit c933453)

…rs' is deactivated 'misc-const-correctness' previously considered arrays as 'Values' independent of the type of the elements. This is inconsistent with the configuration of the check to disable treating pointers as values. This patch rectifies this inconsistency. Fixes llvm#56749 Reviewed By: njames93 Differential Revision: https://reviews.llvm.org/D130793 (cherry picked from commit e66345d)

Improve the documentation for 'misc-const-correctness' to: - include better examples - improve the english - fix links to other checks that were broken due to the directory-layout changes - mention the limitation that the check does not run on `C` code. Addresses llvm#56749, llvm#56958 Reviewed By: njames93 Differential Revision: https://reviews.llvm.org/D132244 (cherry picked from commit b5b7503)

If a C source file includes the libc++ stdatomic.h, compilation will break because (a) the C++ standard check will fail (which is expected), and (b) `_LIBCPP_COMPILER_CLANG_BASED` won't be defined because the logic defining it in `__config` is guarded by a `__cplusplus` check, so we'll end up with a blank header. Move the detection logic outside of the `__cplusplus` check to make the second check pass even in a C context when you're using Clang. Note that `_LIBCPP_STD_VER` is not defined when in C mode, hence stdatomic.h needs to check if in C++ mode before using that macro to avoid a warning. In an ideal world, a C source file wouldn't be including the libc++ header directory in its search path, so we'd never have this issue. Unfortunately, certain build environments make this hard to guarantee, and in this case it's easy to tweak this header to make it work in a C context, so I'm hoping this is acceptable. Fixes llvm#57710. Differential Revision: https://reviews.llvm.org/D134591 (cherry picked from commit afec0f0)

When checking the RHS of fdiv, we should set the SignBitOnly flag, because a negative zero can become -Inf, which is ordered less than zero. Fixes llvm#58046. Differential Revision: https://reviews.llvm.org/D134876

Fixes a null dereference in some diagnostic issuing code. Closes llvm#57370 Closes llvm#58028 Reviewed By: shafik Differential Revision: https://reviews.llvm.org/D134885 (cherry picked from commit 9415aad)

(cherry picked from commit 05b3493)

Add test showing miscompilation during epilogue vectorization with SVE. (cherry picked from commit 1716700)

The dependent code has been changed quite a lot since 151c144 which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre 151c144 any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes llvm#57912. (cherry picked from commit 2c692d8)

(cherry picked from commit dd428a5)

Currently, clang does not emit debuginfo for the switch stmt case value if it is an enum value. For example, $ cat test.c enum { AA = 1, BB = 2 }; int func1(int a) { switch(a) { case AA: return 10; case BB: return 11; default: break; } return 0; } $ llvm-dwarfdump test.o | grep AA $ Note that gcc does emit debuginfo for the same test case. This patch added such a support with similar implementation to CodeGenFunction::EmitDeclRefExprDbgValue(). With this patch, $ clang -g -c test.c $ llvm-dwarfdump test.o | grep AA DW_AT_name ("AA") $ Differential Revision: https://reviews.llvm.org/D134705 (cherry picked from commit 75be048)

add LLVM_PREFER_STATIC_ZSTD (default TRUE) cmake config flag (compression test seems to fail for shared zstd on windows, note that zstd multithread is by default disabled in the static build so it may be a hidden variable) propagate variable zstd_DIR in LLVMConfig.cmake.in fix llvm-config CMakeLists.txt behavior for absolute libs windows get zstd lib name Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D132870 (cherry picked from commit c0b4f24)

removes LLVM_PREFER_STATIC_ZSTD in favor of using a LLVM_USE_STATIC_ZSTD Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D133222 (cherry picked from commit fc1da04)

python3 ../llvm/utils/update_llc_test_checks.py --llc-binary ./bin/llc ../llvm/test/CodeGen/Z80/*.ll

Idk how to squash with commits inbetween

Not needed on some compilers/toolchain, but doesn't hurt anyway. More modern LLVM versions have it as well.

If needed, we could run something like this in powershell: Get-CimInstance Win32_LogicalDisk | Select-Object DeviceID, Size, FreeSpace

…uctions

brad0 and others added 30 commits September 19, 2022 08:51

[lit] Set shlibpath_var on OpenBSD

433f2aa

(cherry picked from commit 3eca0b3)

[libcxx] Bump libc++ version to 15.0.1

43b5b04

[docs] Fix build-docs.sh

c6d2e8b

If libcxxabi is not included CMake will error out: Cannot find target libcxxabi-SHARED I ran into this doing the 15.0.0 release Differential Revision: https://reviews.llvm.org/D133475

[LV] Add tests for epilogue vectorization with widened inductions.

c079a29

Includes a test for the miscompile in llvm#57712.

[LV] Move new epilog-vectorization-widen-inductions.ll to AArch64 dir.

38b5fa7

The test requires the AArch64 backend, so move it to the right subdir.

SPIRV: Fix compilation in NDEBUG.

5d9fa4d

(cherry picked from commit 59351fe)

[MachineCycle][NFC] add a cache for block and its top level cycle

f1ad3ab

This solves llvm#57664 Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D134019 (cherry picked from commit c941d92)

Bump version to 15.0.2

451e3b6

[clangd] Avoid crash when printing call to string literal operator te…

e08c165

…mplate Differential Revision: https://reviews.llvm.org/D132830 (cherry picked from commit 898c421)

Fix build error in StmtPrinterTest.cpp

10a5497

(cherry picked from commit c933453)

[InstSimplify] Add test for PR58046 (NFC)

6ba100a

[ValueTracking] Fix CannotBeOrderedLessThanZero() for fdiv (PR58046)

77ff99c

When checking the RHS of fdiv, we should set the SignBitOnly flag, because a negative zero can become -Inf, which is ordered less than zero. Fixes llvm#58046. Differential Revision: https://reviews.llvm.org/D134876

[Clang] Fix variant crashes from GH58028, GH57370

ebbb544

Fixes a null dereference in some diagnostic issuing code. Closes llvm#57370 Closes llvm#58028 Reviewed By: shafik Differential Revision: https://reviews.llvm.org/D134885 (cherry picked from commit 9415aad)

[LV] Convert sve-epilog-vect.ll to use opaque pointers.

966e71d

(cherry picked from commit 05b3493)

[LV] Add test for llvm#57912.

b3669eb

Add test showing miscompilation during epilogue vectorization with SVE. (cherry picked from commit 1716700)

[gn build] (manually) port 18b4a8b more

541ea23

(cherry picked from commit dd428a5)

use LLVM_USE_STATIC_ZSTD

4bd3f37

removes LLVM_PREFER_STATIC_ZSTD in favor of using a LLVM_USE_STATIC_ZSTD Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D133222 (cherry picked from commit fc1da04)

adriweb and others added 28 commits May 12, 2025 13:34

[rebase fixes] fix z80 Target

54d79a5

[rebase fixes] fix GlobalISEL

3b4a160

[rebase fixes] fix cc1

2df38fc

[rebase fixes] misc fixes

abc64a0

[CI] add rebase-v15 to CI

c87f4d0

Add option for GAS-style directives

7d7afbe

a few changes

dd0559c

Fix 64-bit float libcalls which use standard library functions

bd5a60c

Define CLZ behavior for zero input on Z80

0c57895

Prevent despeculation of count-zero operations on Z80

d1ad5e5

test: z80 codegen test updates.

0b05aee

python3 ../llvm/utils/update_llc_test_checks.py --llc-binary ./bin/llc ../llvm/test/CodeGen/Z80/*.ll

[Z80] improve gas syntax

ed66658

More Z80MCAsmInfo fixes

5318730

Fix assume adl=1 on Z80

628e006

Fix fixed-length character arrays producing invalid GAS syntax

c195886

Correct alignment directive

390fd31

Use tabs for ADL directives

7da9970

Idk how to squash with commits inbetween

Do not null out zero directive for GAS

5d10ecb

String fix v2

26c058e

Enable interrupts before returning from interrupt handler

1dacf03

Add CTTZ libcalls and use them on Z80 targets

e1486dc

adjust CI

db91fd7

build: explicitly disable building with zstd

e04f59c

SmallVector.h: add stdint include. Fix #8

757a027

Not needed on some compilers/toolchain, but doesn't hurt anyway. More modern LLVM versions have it as well.

ci: windows: disable old wmic-based disk usage fetching.

6593d86

If needed, we could run something like this in powershell: Get-CimInstance Win32_LogicalDisk | Select-Object DeviceID, Size, FreeSpace

Add pull requests to github workflows

5fc6989

Fix assertion failure on liveness analysis of comparison pseudo-instr…

1cbcc3b

…uctions

Fix invalid comparison codegen for inlined memmove, optimize LDDR path

629b8d7

calc84maniac force-pushed the fix-memmove branch from 19b92bc to 629b8d7 Compare September 22, 2025 02:31

adriweb force-pushed the z80 branch from dc0f841 to d7faed1 Compare December 29, 2025 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix possible invalid instruction in inlined memmove, optimized LDDR path #9

Fix possible invalid instruction in inlined memmove, optimized LDDR path #9

Uh oh!

calc84maniac commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

20 participants

Fix possible invalid instruction in inlined memmove, optimized LDDR path #9

Are you sure you want to change the base?

Fix possible invalid instruction in inlined memmove, optimized LDDR path #9

Uh oh!

Conversation

calc84maniac commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

20 participants