I’d like to perform polynomial multiplication of two `uint64_t`

values (where the least significant bit (the one got by `w&1`

) is the least significant coefficient (the a_{0} in for w(x)=∑_{i}a_{i}*x^{i} )) on ARM and get the least significant 64 coefficients (a_{0}…a_{63}) of the result as `uint64_t`

(so `result>>i&1`

is a_{i}).

It’s not clear to me, however, what is the standard-compliant way to convert `uint64_t`

to `poly64_t`

and (least significant part of) `poly128_t`

to `uint64_t`

.

poly8_t, poly16_t, poly64_t and poly128_t are defined as unsigned integer types. It is unspecified whether these are the same type as uint8_t, uint16_t, uint64_t and uint128_t for overloading and mangling purposes.

ACLE does not define whether int64x1_t is the same type as int64_t, or whether uint64x1_t is the same type as uint64_t, or whether poly64x1_t is the same as poly64_t for example for C++ overloading purposes.

source: https://developer.arm.com/documentation/101028/0009/Advanced-SIMD–Neon–intrinsics

Above quotes opens some scary possibilities in my head like perhaps the bit order is flipped, or there’s some padding, or who knows, maybe these are some `struct`

s.

So far I’ve come out with these two:

```
poly64_t uint64_t_to_poly64_t(uint64_t x) {
return vget_lane_p64(vcreate_p64(x), 0);
}
uint64_t less_sinificant_half_of_poly128_t_to_uint64_t(poly128_t big) {
return vgetq_lane_u64(vreinterpretq_u64_p128(big), 0);
}
```

But they seem cumbersome (as they go through some intermediary stuff like `poly64x1_t`

), and still make some assumptions (like that `poly128_t`

can be treated as a vector of two `uint64_t`

, and that the the 0-th `uint64_t`

will contain the "less significant coefficients", and that least significant polynomial coefficient will be at the least significant `uint64_t`

‘s bit).

OTOH it seems that I can simply "ignore" the whole issue, and just pretend that integers are polynomials as the two functions produce the same assembly:

```
__attribute__((target("+crypto")))
uint64_t polynomial_mul_low(uint64_t v,uint64_t w) {
const poly128_t big = vmull_p64(uint64_t_to_poly64_t(v),
uint64_t_to_poly64_t(w));
return less_sinificant_half_of_poly128_t_to_uint64_t(big);
}
__attribute__((target("+crypto")))
uint64_t polynomial_mul_low_naive(uint64_t v,uint64_t w) {
return vmull_p64(v,w);
}
```

that is:

```
fmov d0, x0
fmov d1, x1
pmull v0.1q, v0.1d, v1.1d
fmov x0, d0
ret
```

also, the assembly for `uint64_t_to_poly_64_t`

and `less_sinificant_half_of_poly128_t_to_uint64_t`

seems to be a no-op, which supports the hypothesis that there are no steps involved in conversion, really.

(See above in action: https://godbolt.org/z/o6bYsn4E4)

Also:

```
__attribute__((target("+crypto")))
uint64_t polynomial_mul_low_naive(uint64_t v,uint64_t w) {
return (uint64_t)vmull_p64(poly64_t{v},poly64_t{w});
}
```

seems to compile, and while the `{..}`

s give me the soothing confidence that no narrowing occurred, I’m still unsure if the order of the bits and order of the coefficients are guaranteed to be consistent, and thus have some worries about the final `(uint64_t)`

cast.

I want my code to be correct w.r.t. to standards, as opposed to just work by an accident, as it has to be written once and run on many ARM64 platforms, hence my question:

**How does one perform a proper conversion between polyXXX_t and uintXXX_t, and how does one extract "lower half of coefficients" from polyXXX_t?**

Source: Windows Questions C++