Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions crates/core_arch/src/s390x/vector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,12 @@ unsafe extern "unadjusted" {
#[link_name = "llvm.s390.vfenezbs"] fn vfenezbs(a: i8x16, b: i8x16) -> PackedTuple<i8x16, i32>;
#[link_name = "llvm.s390.vfenezhs"] fn vfenezhs(a: i16x8, b: i16x8) -> PackedTuple<i16x8, i32>;
#[link_name = "llvm.s390.vfenezfs"] fn vfenezfs(a: i32x4, b: i32x4) -> PackedTuple<i32x4, i32>;

#[link_name = "llvm.s390.vclfnhs"] fn vclfnhs(a: vector_signed_short, immarg: i32) -> vector_float;
#[link_name = "llvm.s390.vclfnls"] fn vclfnls(a: vector_signed_short, immarg: i32) -> vector_float;
#[link_name = "llvm.s390.vcfn"] fn vcfn(a: vector_signed_short, immarg: i32) -> vector_signed_short;
#[link_name = "llvm.s390.vcnf"] fn vcnf(a: vector_signed_short, immarg: i32) -> vector_signed_short;
#[link_name = "llvm.s390.vcrnfs"] fn vcrnfs(a: vector_float, b: vector_float, immarg: i32) -> vector_signed_short;
}

#[repr(simd)]
Expand Down Expand Up @@ -5911,6 +5917,74 @@ pub unsafe fn vec_promote<T: sealed::VectorPromote>(a: T::ElementType, b: i32) -
T::vec_promote(a, b)
}

/// Converts the left-most half of `a` to a vector of single-precision numbers.
/// The format of the source vector elements is specified by `B`.
#[inline]
#[target_feature(enable = "nnp-assist")]
#[cfg_attr(test, assert_instr(vclfnh, B = 0))]
#[unstable(feature = "stdarch_s390x", issue = "135681")]
pub unsafe fn vec_extend_to_fp32_hi<const B: i32>(a: vector_signed_short) -> vector_float {
// On processors implementing the IBM z16 architecture, only the value 0 is supported.
static_assert_uimm_bits!(B, 4);

vclfnhs(a, B)
}

/// Converts the right-most half of `a` to a vector of single-precision numbers.
/// The format of the source vector elements is specified by `B`.
#[inline]
#[target_feature(enable = "nnp-assist")]
#[cfg_attr(test, assert_instr(vclfnl, B = 0))]
#[unstable(feature = "stdarch_s390x", issue = "135681")]
pub unsafe fn vec_extend_to_fp32_lo<const B: i32>(a: vector_signed_short) -> vector_float {
// On processors implementing the IBM z16 architecture, only the value 0 is supported.
static_assert_uimm_bits!(B, 4);

vclfnls(a, B)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent to

https://godbolt.org/z/dGaf4P7sa

Clearly that optimizes horribly at the moment. If the const value being 0 does the obvious thing, I believe all of these could be implemented in terms of simpler simd primitives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if currently only 0 is supported, we can just use SIMD primitives, as the assertion will ensure no other value is passed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, though currently it seems unspecified what the conversion method is (I think there is only one implementation that actually makes sense, but then why is the IMM argument even there?).

Also currently the SIMD primitives don't optimize into the instruction that this intrinsic should map to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AI accelerator unit operates on its own private data types. In particular, it uses a 16-bit floating point type which is neither IEEE-16 nor bfloat16, but a proprietary format. In order to prepare input/output data to be used with the accelerator, applications need to convert standard (IEEE) data types to and from this private data type; for this purpose, the ISA provides these conversion instructions (mapped to compiler intrinsics).

In principle, the accelerator might support multiple different private data types, and the immediate operand of these intrinsics identifies which of those types the conversion should target. This is not specified by the ISA but may differ between processor generations. However, all current processors only support a single private data type, identified by the immediate value 0.

So in practice, the immediate will always be 0 today. I'm not convinced this ought to be enforced by the compiler - if a future processor adds a second type, it might be good if we could use the intrinsic without having to update the compiler.

Either way, whatever the immediate value is, there is no possibility to open-code the conversion with standard LLVM IR - the private floating-point format is unknown to LLVM! This absolutely has to map to the LLVM builtin (and thus the special instruction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I've changed the code to accept the full 0..=15 range there.

}

/// Converts the elements of vector `a` to the 16-bit IEEE floating point format.
/// The format of the source vector elements is specified by `B`.
#[inline]
#[target_feature(enable = "nnp-assist")]
#[cfg_attr(test, assert_instr(vcfn, B = 0))]
#[unstable(feature = "stdarch_s390x", issue = "135681")]
pub unsafe fn vec_convert_to_fp16<const B: i32>(a: vector_signed_short) -> vector_signed_short {
// On processors implementing the IBM z16 architecture, only the value 0 is supported.
static_assert_uimm_bits!(B, 4);

vcfn(a, B)
}

/// Converts the elements of vector `a` to an internal floating point format.
/// The format of the target vector elements is specified by `B`.
#[inline]
#[target_feature(enable = "nnp-assist")]
#[cfg_attr(test, assert_instr(vcnf, B = 0))]
#[unstable(feature = "stdarch_s390x", issue = "135681")]
pub unsafe fn vec_convert_from_fp16<const B: i32>(a: vector_signed_short) -> vector_signed_short {
// On processors implementing the IBM z16 architecture, only the value 0 is supported.
static_assert_uimm_bits!(B, 4);

vcnf(a, B)
}

/// Converts the elements of single-precision vectors `a` and `b` to an internal floating point
/// format with 16-bit sized elements. The format of the target vector elements is specified by `C`.
#[inline]
#[target_feature(enable = "nnp-assist")]
#[unstable(feature = "stdarch_s390x", issue = "135681")]
#[cfg_attr(test, assert_instr(vcrnf, C = 0))]
pub unsafe fn vec_round_from_fp32<const C: i32>(
a: vector_float,
b: vector_float,
) -> vector_signed_short {
// On processors implementing the IBM z16 architecture, only the value 0 is supported.
static_assert_uimm_bits!(C, 4);

vcrnfs(a, b, C)
}

#[cfg(test)]
mod tests {
use super::*;
Expand Down