| 102 | Note that there are lots of potential tricks. For example, |

| 103 | consider the implementation of simd_hl<2>::add(a), where |

| 104 | addition is natively supported for only larger field widths. |

| 105 | A direct implementation requires 1 shift, two mask and one add |

| 106 | operation. |

| 107 | |

| 108 | simd<16>::add(simd<16>::srli(a, 1) & simd<2>:constant(1), a & simd<2>:constant(1)) |

| 109 | |

| 110 | But one of the masks can be eliminated by taking advantage |

| 111 | of the properties of 2-bit subtraction |

| 112 | simd<16>::sub(a, simd<16>::srli(a, 1) & simd<2>:constant(1)) |

| 113 | |

| 114 | |

| 115 | |

| 116 | |