Elementary Functions: Algorithms and Implementation
Rigorous treatment of transcendental function implementation: IEEE 754 fundamentals (ULPs, rounding modes, FMA, subnormals), polynomial approximation via Chebyshev/Remez minimax, Horner/Estrin evaluation, range reduction (Cody-Waite, Payne-Hanek for large arguments), CORDIC for sin/cos/atan, and correct-rounding via Ziv's multilevel strategy.
- › Evaluate polynomials efficiently using Horner's method (sequential) or Estrin's method (pipelined)
- › Design minimax polynomial approximations using Chebyshev polynomials and the Remez algorithm
- › Implement range reduction for exp/log/sin/cos using Cody-Waite or Payne-Hanek techniques
- › Reason about floating-point error in ULPs and design algorithms targeting ≤1 ulp accuracy
- › Apply CORDIC algorithm to compute sin/cos/atan via iterative shift-and-add rotations
- › Use FMA (fused multiply-add) to avoid intermediate rounding in double-double arithmetic
Install this skill and Claude can implement the full three-stage pipeline for any transcendental function — Cody-Waite range reduction, Horner/Estrin polynomial evaluation, and FMA-based reconstruction — while reasoning about per-stage ULP error budgets and targeting faithful or correctly-rounded output
Explains exactly why naive implementations of exp, log, sin, and cos lose precision and how production libm libraries solve it — enabling Claude to implement correctly-rounded elementary functions, audit numerical code for precision loss, and choose the right accuracy-versus-performance tradeoff
- › Implementing a double-precision sin(x) using Cody-Waite range reduction to [-π/4, π/4] followed by a degree-11 Horner polynomial targeting faithful rounding within 1 ULP
- › Tracing the ULP error introduced at each stage of a multi-step floating-point computation to identify which stage — range reduction, polynomial evaluation, or reconstruction — dominates total error
- › Implementing the CORDIC rotation-mode algorithm for an FPGA target where multiplication is expensive, using only integer shifts, adds, and a precomputed atan(2^-i) table with scale-factor correction
Elementary Functions Skill
IEEE 754 Floating-Point Fundamentals
Formats
| Format | Sign | Exponent | Mantissa | Total | Range |
|---|---|---|---|---|---|
| Binary32 (float) | 1 | 8 | 23 | 32 | ±3.4×10³⁸ |
| Binary64 (double) | 1 | 11 | 52 | 64 | ±1.8×10³⁰⁸ |
| Binary80 (x87 long double) | 1 | 15 | 63+1 | 80 | ±1.2×10⁴⁹³² |
double representation:
value = (-1)^sign × 1.mantissa × 2^(exponent - 1023)
"hidden bit": leading 1 not stored (implied)
subnormals: exponent = 0 → no hidden bit, gradual underflow
Rounding Modes (IEEE 754)
Round to nearest, ties to even (RNE) — default
Round toward +∞ (ceiling)
Round toward −∞ (floor)
Round toward zero (truncation)
Correct rounding: returned result = exact rounding of mathematical result
→ this is what IEEE 754 guarantees for +, -, *, /, √
→ NOT guaranteed by standard for transcendentals (exp, log, sin, etc.)
→ correctly rounded transcendentals = difficult research problem
ULP (Unit in the Last Place)
ulp(x): spacing between consecutive floats around x
= 2^(exponent of x) × 2^(-52) (for double)
Error measurement: "0.5 ulp error" means result within 0.5 spacing of exact
Correctly rounded → ≤ 0.5 ulp error
"Faithful rounding" → ≤ 1 ulp error (less strict, easier to achieve)
In code:
double nextafter(x, +INFINITY); // next representable double above x
// ulp(x) = nextafter(x, INFINITY) - x
Fused Multiply-Add (FMA)
FMA(a, b, c) = a*b + c with a single rounding (not two)
= round(a×b + c) vs. round(round(a×b) + c)
Key uses:
1. Accurate dot products: avoid intermediate rounding
2. Error-free transformations: s = fl(a+b), e = a+b-s (Sterbenz/Knuth)
3. Compensated summation (Kahan summation)
4. Fast 2-sum algorithms in double-double arithmetic
CPU support: x86 (AVX2+), ARM (NEON/SVE), hardware: __builtin_fma(a,b,c)
Polynomial Evaluation
Horner’s Method (preferred)
P(x) = a₀ + a₁x + a₂x² + ... + aₙxⁿ
= a₀ + x(a₁ + x(a₂ + ... + x(aₙ₋₁ + x·aₙ)...))
Implementation (n+1 multiplications, n additions, sequential):
double horner(double x, double* a, int n) {
double p = a[n];
for (int i = n-1; i >= 0; i--)
p = p * x + a[i];
return p;
}
// Error: bounded by n × ulp(P(x)) approximately
// Limitation: sequential → poor pipeline utilization on superscalar CPUs
Estrin’s Method (parallelizable)
// Group into pairs, then pairs-of-pairs
// P(x) = (a₀ + a₁x) + x²(a₂ + a₃x) + x⁴(a₄ + a₅x) + ...
// Then combine pairs: P = pair₀ + x²·pair₁ + x⁴·pair₂ + ...
// Allows multiple FP units to work in parallel
// Slightly higher error than Horner (more rounding steps) but faster on pipelined hardware
Approximation Methods
Chebyshev Polynomials (near-minimax)
T₀(x) = 1
T₁(x) = x
Tₙ(x) = 2x·Tₙ₋₁(x) - Tₙ₋₂(x)
Key property: Tₙ oscillates between ±1 on [-1,1] with n+1 equioscillation points
→ Chebyshev expansion gives near-minimax polynomial approximation
Process:
1. Map interval [a,b] to [-1,1]: t = 2(x-a)/(b-a) - 1
2. Compute Chebyshev coefficients: cₙ = (2/N) Σ f(xₖ)Tₙ(xₖ)
3. Truncate series at desired precision
4. Convert to standard polynomial for Horner evaluation
Better than Taylor: equioscillates error vs. one-sided error buildup
Remez Algorithm (true minimax)
Goal: find polynomial P of degree n minimizing max|f(x) - P(x)| on [a,b]
Key theorem (Chebyshev equioscillation): P* is the minimax polynomial
iff the error f(x) - P*(x) equioscillates at n+2 points.
Remez exchange algorithm (iterative):
1. Initialize with Chebyshev nodes as extremal points
2. Solve linear system for P that equioscillates at current n+2 nodes
3. Find actual extrema of f - P between nodes
4. Exchange nodes, repeat until convergence
Tools: Maple (fminimax), Sollya (remez()), Mathematica (MiniMaxApproximation)
Typical error: degree-8 polynomial for sin on [0,π/4] ≈ 10⁻¹⁶ (double precision)
Rational Approximations
R(x) = P(x) / Q(x) where P, Q are polynomials
Advantage: same accuracy as polynomial at lower degree (Padé approximants)
converges where polynomials converge slowly (near singularities)
Disadvantage: one extra division; more complex error analysis
Good for: computing 1/x, x^(1/3), arctan near ±π/2
Range Reduction
Why It’s Needed
Polynomial approximations only work on small intervals.
Strategy: reduce x to x' ∈ [-δ, δ], then apply polynomial, then reconstruct.
Example for exp(x):
x = n·ln2 + r, where n = round(x/ln2), r = x - n·ln2 ∈ [-ln2/2, ln2/2]
exp(x) = 2ⁿ × exp(r) ← ldexp(exp(r), n) in C
exp(r) computed by polynomial on small interval
Cody-Waite Method (moderate range)
// For sin/cos: reduce to [-π/4, π/4]
// Store π/2 as two-part float for accuracy: PIO2_HI + PIO2_LO
// n = round(x / (π/2))
// r = x - n*PIO2_HI - n*PIO2_LO (two-step for accuracy)
// Then use sin/cos on r based on n mod 4
#define PIO2_HI 1.5707963267948966192e+00 // high part
#define PIO2_LO 6.123233995736765886130e-17 // low part (Cody-Waite)
Payne-Hanek Method (large arguments)
Problem: for large x (e.g., x = 1e16), n*PIO2 has catastrophic cancellation
Cody-Waite fails; need extended precision precomputed table of 4/π bits.
Method:
1. Use many extra bits of 4/π stored as integer array
2. Multiply x by appropriate portion of the table
3. Extract remainder mod 1 → accurate r
Used in: C99 __kernel_rem_pio2, glibc, libm implementations
CORDIC Algorithm
Circular Mode (sin/cos/atan)
CORDIC (COordinate Rotation DIgital Computer):
Rotate vector (x, y) by angle θ using only shifts and additions.
Iteration: for i = 0, 1, 2, ..., n-1:
σᵢ = sign(remaining angle)
x_{i+1} = xᵢ - σᵢ · yᵢ · 2^(-i)
y_{i+1} = yᵢ + σᵢ · xᵢ · 2^(-i)
z_{i+1} = zᵢ - σᵢ · atan(2^(-i))
After n iterations:
x_n = K · cos(θ) (K ≈ 0.6073 = 1/∏√(1 + 2^(-2i)))
y_n = K · sin(θ)
z_n ≈ 0
Scale factor: K must be compensated (multiply result by 1/K)
CORDIC modes:
Rotation mode: compute cos(z), sin(z) from input angle z
Vectoring mode: compute atan(y/x), magnitude from input (x, y)
Used in: FPGAs, embedded processors, old x87 (original implementation), calculators
Implementation Strategy
High-Quality Library Design Pattern
Step 1: Range reduction
→ bring argument into small interval [-δ, δ]
→ choose δ small enough for target precision
→ must be done in extra precision (e.g., Cody-Waite two-step)
Step 2: Polynomial approximation
→ minimax (Remez) polynomial on reduced interval
→ evaluate with Horner (sequential) or Estrin (pipelined)
→ coefficients stored in C as double literals
Step 3: Reconstruction
→ recombine polynomial result with range reduction parameters
→ must use FMA or double-double when precision is tight
Step 4: Correct rounding (optional)
→ Ziv's multilevel strategy: first try at low precision,
if result is ambiguous (near midpoint), retry at higher precision
Error Budget Analysis
For double precision (52-bit mantissa), target: ≤ 1 ulp
Sources of error:
1. Approximation error: controlled by polynomial degree
2. Rounding in polynomial evaluation: Horner adds ~n ulp
3. Range reduction error: catastrophic if done naively for large x
4. Reconstruction rounding: use FMA if possible
Rule of thumb: work with 1-2 guard bits beyond target precision
use double-double or quad-double when needed
Reference: Key Algorithm Applications
| Function | Range Reduction | Method | Notes |
|---|---|---|---|
| exp(x) | n = round(x/ln2), r = x - n·ln2 | Polynomial + ldexp | n-bit table for speed |
| log(x) | factor out 2^n, normalize mantissa | Polynomial + n·log2 | Argument near 1 |
| sin/cos | Payne-Hanek (large x), Cody-Waite (small) | Polynomial on [-π/4, π/4] | Sign from quadrant |
| atan(x) | Reduce to [0, 1] via identity | Polynomial + identity | atan(x)=π/2-atan(1/x) |
| sqrt(x) | Normalize, extract exponent | Newton-Raphson refinement | IEEE mandates correct rounding |