Roadmap

The roadmap document define the direction that the project is taking.

The initial and decisive part of the project is the implementation of native tensor abstractions backed by Apache Arrow. But in order to get to that point, we need first implement a bunch of small pieces across the Arx + IRx stack. Arx owns the surface front end (lexer, parser, docs, examples), while IRx owns AST definitions, semantic analysis, lowering, and code generation.

Data type support

ArxLang is based on Kaleidoscope compiler, so it just implements float data type for now.

In order to accept more datatypes, the language should have a way to specify the type for each variable and function returning.

Implement native tensors

Native tensors now have an initial Arrow C++ backed implementation. Remaining work should continue to make runtime-shaped tensor values usable in more contexts, while preserving the same runtime-layout rules for every collection type that uses that approach.

Expand runtime-layout annotations beyond function and extern parameters once default values, ownership, and type checking are ready for local declarations and expression contexts.
Keep tensor semantics aligned with the Arrow-backed runtime rather than adding Arx-local lowering behavior.

DataFrames and Series

DataFrames are a distinct public collection abstraction for heterogeneous named columns. Static-schema values use dataframe[name: T, ...], column views use series[T], and literals are constructed with dataframe({...}).

Add the builtin dataframe[...] type.
Add the builtin series[T] type for typed DataFrame columns.
Add the builtin dataframe({...}) constructor for column-oriented literals.
Back DataFrame values with Arrow C++ arrow::Table.
Back Series values with Arrow C++ arrow::ChunkedArray.
Keep the MVP limited to fixed-width numeric and bool columns.
Add string, nullable, nested, temporal, and user-defined column support after the fixed-width MVP is stable.
Expand runtime-layout/schema annotations beyond function and extern parameters, applying the same behavior to both dataframe[...] and tensor[T, ...].

Type System Follow-ups

Add parser-level support for optional list, tensor, series, and dataframe size/shape annotations once the surface-syntax restrictions are ready to change.
Add runtime check sidecars for assigning unknown-size values to sized targets, including list length, tensor shape, series length, and dataframe row count; sized annotations must not become trusted metadata until the runtime check has passed.
Add support for partial tensor shape constraints using ellipsis, such as tensor[f64, 2, ...], tensor[f64, ..., 3], and tensor[f64, 2, ..., 3].
Add symbolic shape variables for generic algorithms, such as fn dot[N](a: tensor[f64, N], b: tensor[f64, N]).

Roadmap

Improve the language structure

Data type support

Implement native tensors

DataFrames and Series

Type System Follow-ups