Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions docs/cookbook/matrix-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

Focused recipes for the `matrix-stats` module.

## Setup Notes

Use the Matrix BOM and add Groovy explicitly. `matrix-stats` publishes Groovy as `compileOnly` and
does not bring Apache Commons Math in as a runtime dependency anymore.

## Correlation

Compute Pearson, Spearman, and Kendall correlations with the same entry point.
Expand Down Expand Up @@ -37,6 +42,86 @@ println(minMax['amount'])
println(zScore['amount'])
```

## Linear Algebra

Use `Linalg` for inverse, determinant, eigenvalues, SVD, and linear solves.

```groovy
import se.alipsa.matrix.core.Matrix
import se.alipsa.matrix.stats.linalg.Linalg

Matrix source = Matrix.builder()
.columnNames(['x', 'y'])
.rows([
[4.0, 7.0],
[2.0, 6.0]
])
.types([Double, Double])
.build()

println(Linalg.det(source))
println(Linalg.solve(source, [1.0, 0.0]))
println(Linalg.inverse(source).content())
```

## Interpolation

Use `Interpolation.linear(...)` for explicit domains, evenly spaced series, or Matrix/Grid columns.

```groovy
import se.alipsa.matrix.core.Matrix
import se.alipsa.matrix.stats.interpolation.Interpolation

assert 5.0 == Interpolation.linear([0.0, 2.0, 4.0], [0.0, 10.0, 20.0], 1.0)
assert 30.0 == Interpolation.linear([10.0, 20.0, 40.0], 1.5)

Matrix points = Matrix.builder()
.columnNames(['time', 'value'])
.rows([
[1.0, 3.0],
[2.0, 6.0],
[4.0, 12.0]
])
.types([Double, Double])
.build()

assert 9.0 == Interpolation.linear(points, 'time', 'value', 3.0)
```

## Formula Models

Build a model frame first, then dispatch to a named fit method through `FitRegistry`.
When using the Groovy-native closure DSL, `Formula.build { y | x + group }`, keep it
outside `@CompileStatic` callers or wrap it in a `@CompileDynamic` helper because bare
column names use dynamic `propertyMissing` lookup. In that DSL, write intercept removal
as `noIntercept + x` rather than `0 + x`.

```groovy
import se.alipsa.matrix.core.Matrix
import se.alipsa.matrix.stats.formula.ModelFrame
import se.alipsa.matrix.stats.formula.NaAction
import se.alipsa.matrix.stats.regression.FitRegistry

Matrix data = Matrix.builder()
.columnNames(['y', 'x', 'group'])
.rows([
[1.0, 1.0, 'A'],
[2.0, 2.0, 'B'],
[3.0, 3.0, 'A'],
[4.0, 4.0, 'B']
])
.types([BigDecimal, BigDecimal, String])
.build()

def frame = ModelFrame.of('y ~ x + group', data)
.naAction(NaAction.OMIT)
.evaluate()

def fit = FitRegistry.instance().get('lm').fit(frame)
println(fit.predictorNames)
println(fit.rSquared)
```

## T-tests

Use Welch for unequal variances and Student for pooled-variance or paired designs.
Expand Down Expand Up @@ -74,6 +159,47 @@ println("F = ${anova.fValue}, p = ${anova.pValue}")
println(anova.evaluate() ? 'Reject equal means' : 'Fail to reject equal means')
```

## Native Distributions

Use the native distribution classes directly when you need CDF, quantile, or exact probability helpers.

```groovy
import se.alipsa.matrix.stats.distribution.HypergeometricDistribution
import se.alipsa.matrix.stats.distribution.NormalDistribution

def normal = new NormalDistribution(2, 1.5)
println(normal.cumulativeProbability(2))
println(normal.inverseCumulativeProbability(0.975))

def hyper = new HypergeometricDistribution(37, 21, 17)
println(hyper.probability(10))
println(hyper.cumulativeProbability(10))
```

## Numerical Solvers

Use Brent for scalar roots and the simplex solver for equality-form linear programs.

```groovy
import se.alipsa.matrix.stats.solver.BrentSolver
import se.alipsa.matrix.stats.solver.LinearProgramSolver
import se.alipsa.matrix.stats.solver.UnivariateObjective

def root = BrentSolver.solve(
{ double x -> x * x - 2.0d } as UnivariateObjective,
0.0,
2.0,
1.0e-12,
1.0e-12,
100
)
println(root.rootValue)

def solution = LinearProgramSolver.minimize([1.0, 2.0], [[1.0, 1.0]], [1.0])
println(solution.pointValues)
println(solution.objectiveValue)
```

## Linear Regression

Fit a simple regression and make predictions.
Expand Down
18 changes: 5 additions & 13 deletions docs/python-comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ Matrix.rolling(window: 30).apply { it.max() - it.min() } // Custom rolling
| **Regression** | Linear, Polynomial, Quantile, Logistic, Ridge, LASSO, ElasticNet | via statsmodels/sklearn |
| **T-tests** | Welch, Student, paired, one-sample | scipy.stats |
| **ANOVA** | One-way | scipy.stats |
| **Linear algebra** | `matrix-stats` `Linalg` facade (inverse, det, solve, SVD, eigenvalues) | `numpy.linalg`, `scipy.linalg` |
| **Clustering** | K-means, K-means++, DBSCAN (via Smile) | sklearn comparable |
| **Distributions** | Normal, Exponential, Gamma, Beta, Poisson, etc. + fitting | scipy.stats comparable |
| **Classification** | Random Forest, Decision Trees (via Smile) | sklearn comparable |
Expand Down Expand Up @@ -172,33 +173,24 @@ The matrix-smile module provides comprehensive ML capabilities via integration w
### NumPy/SciPy Advantages

1. **Broadcasting** - Automatic shape expansion for operations
2. **Linear algebra** - `np.linalg` (eigenvalues, SVD, matrix inverse, determinant)
2. **Broader linear algebra ecosystem** - `np.linalg`, `scipy.linalg`, sparse matrices, factorizations
3. **FFT** - Fast Fourier Transform
4. **Signal processing** - scipy.signal
5. **Optimization** - scipy.optimize (minimize, curve fitting)
6. **Interpolation** - scipy.interpolate
6. **Broader interpolation families** - scipy.interpolate goes far beyond the current linear interpolation helpers in `matrix-stats`

### Potential Improvements for Matrix

**Medium effort:**

```groovy
// 1. Linear algebra module (HIGH VALUE for scientific computing)
import se.alipsa.matrix.linalg.Linalg

Linalg.inverse(matrix) // Matrix inverse
Linalg.det(matrix) // Determinant
Linalg.eigenvalues(matrix) // Eigenvalue decomposition
Linalg.svd(matrix) // Singular value decomposition
Linalg.solve(A, b) // Solve Ax = b

// 2. Cumulative operations (EASY)
// 1. Cumulative operations (EASY)
table['value'].cumsum() // Cumulative sum
table['value'].cumprod() // Cumulative product
table['value'].cummax() // Running maximum
table['value'].cummin() // Running minimum

// 3. Diff/shift operations (EASY)
// 2. Diff/shift operations (EASY)
table['value'].diff() // First difference
table['value'].diff(2) // Second difference
table['value'].shift(1) // Lag by 1
Expand Down
13 changes: 6 additions & 7 deletions docs/tutorial/1-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ Whether you're performing data analysis, building data processing pipelines, or

## Installation and Setup

The Matrix library is designed to work with any Groovy 4.x version and requires JDK 21 or higher. You can add the library to your project using your preferred build system.
The Matrix library targets Groovy 5 and requires JDK 21. You can add the library to your project using your preferred build system.

### Gradle Configuration

For Gradle projects, you can use the Bill of Materials (BOM) to simplify dependency management:

```groovy
implementation(platform('se.alipsa.matrix:matrix-bom:2.2.0'))
implementation(platform('se.alipsa.matrix:matrix-bom:2.4.0'))
implementation('se.alipsa.matrix:matrix-core')
```

Expand All @@ -66,7 +66,7 @@ For Maven projects, add the following to your `pom.xml`:
<dependency>
<groupId>se.alipsa.matrix</groupId>
<artifactId>matrix-bom</artifactId>
<version>2.2.0</version>
<version>2.4.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
Expand All @@ -87,15 +87,15 @@ If you're using the Matrix library from Java, you'll need to add the Groovy core

```groovy
// For Gradle
implementation('org.apache.groovy:groovy:5.0.4')
implementation('org.apache.groovy:groovy:5.0.5')
```

```xml
<!-- For Maven -->
<dependency>
<groupId>org.apache.groovy</groupId>
<artifactId>groovy</artifactId>
<version>5.0.3</version>
<version>5.0.5</version>
</dependency>
```

Expand All @@ -105,7 +105,7 @@ The Matrix project consists of multiple modules, each providing specific functio

1. **matrix-core**: The heart of the library, containing the Matrix and Grid classes along with utility classes for basic statistics and data conversion.

2. **matrix-stats**: Advanced statistical methods and tests including correlations, normalization, linear regression, t-tests, and more.
2. **matrix-stats**: Advanced statistical methods and tests including correlations, normalization, formula/model-frame evaluation, regression, linear algebra, interpolation, distributions, solvers, t-tests, time-series diagnostics, and more.

3. **matrix-datasets**: Common datasets used in data science, similar to those available in R and Python.

Expand All @@ -132,4 +132,3 @@ The Matrix project consists of multiple modules, each providing specific functio
In the following sections, we'll explore each of these modules in detail, starting with the core functionality provided by the matrix-core module.

Go to [previous section](outline.md) | Go to [next section](2-matrix-core.md)

14 changes: 7 additions & 7 deletions docs/tutorial/10-matrix-bom.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This approach simplifies dependency management and helps avoid version conflicts
To use the Matrix BOM in a Gradle project, add the following to your build script:

```groovy
implementation(platform('se.alipsa.matrix:matrix-bom:2.2.0'))
implementation(platform('se.alipsa.matrix:matrix-bom:2.4.0'))
implementation('se.alipsa.matrix:matrix-core')
implementation('se.alipsa.matrix:matrix-spreadsheet')
// Add other matrix modules as needed without specifying versions
Expand All @@ -40,7 +40,7 @@ To use the Matrix BOM in a Maven project, add the following to your `pom.xml` fi
<dependency>
<groupId>se.alipsa.matrix</groupId>
<artifactId>matrix-bom</artifactId>
<version>2.2.0</version>
<version>2.4.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
Expand Down Expand Up @@ -92,10 +92,10 @@ repositories {

dependencies {
// Import the BOM
implementation(platform('se.alipsa.matrix:matrix-bom:2.3.0'))
implementation(platform('se.alipsa.matrix:matrix-bom:2.4.0'))

// Add Groovy
implementation 'org.apache.groovy:groovy:5.0.4'
implementation 'org.apache.groovy:groovy:5.0.5'

// Add Matrix modules without specifying versions
implementation 'se.alipsa.matrix:matrix-core'
Expand All @@ -118,15 +118,15 @@ dependencies {
<version>1.0.0</version>

<properties>
<groovy.version>5.0.3</groovy.version>
<groovy.version>5.0.5</groovy.version>
</properties>

<dependencyManagement>
<dependencies>
<dependency>
<groupId>se.alipsa.matrix</groupId>
<artifactId>matrix-bom</artifactId>
<version>2.3.0</version>
<version>2.4.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
Expand Down Expand Up @@ -219,4 +219,4 @@ The Matrix BOM module simplifies dependency management for projects that use mul

In the next section, we'll explore the matrix-parquet module, which provides support for the Apache Parquet file format.

Go to [previous section](9-matrix-sql.md) | Go to [next section](11-matrix-parquet.md) | Back to [outline](outline.md)
Go to [previous section](9-matrix-sql.md) | Go to [next section](11-matrix-parquet.md) | Back to [outline](outline.md)
6 changes: 3 additions & 3 deletions docs/tutorial/11-matrix-parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ To use the matrix-parquet module, add the following dependencies to your project
### Gradle Configuration

```groovy
implementation 'org.apache.groovy:groovy:5.0.4'
implementation platform('se.alipsa.matrix:matrix-bom:2.3.0')
implementation 'org.apache.groovy:groovy:5.0.5'
implementation platform('se.alipsa.matrix:matrix-bom:2.4.0')
implementation 'se.alipsa.matrix:matrix-core'
implementation 'se.alipsa.matrix:matrix-parquet'
```
Expand Down Expand Up @@ -178,4 +178,4 @@ The matrix-parquet module provides a convenient way to work with Parquet files i

In the next section, we'll explore the matrix-bigquery module, which provides functionality for interacting with Google BigQuery.

Go to [previous section](10-matrix-bom.md) | Go to [next section](12-matrix-bigquery.md) | Back to [outline](outline.md)
Go to [previous section](10-matrix-bom.md) | Go to [next section](12-matrix-bigquery.md) | Back to [outline](outline.md)
52 changes: 32 additions & 20 deletions docs/tutorial/12-matrix-bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,31 +7,43 @@ The Matrix BigQuery module lets you query BigQuery into a `Matrix`, manage datas
### Gradle

```groovy
implementation 'org.apache.groovy:groovy:5.0.4'
implementation 'se.alipsa.matrix:matrix-core:3.6.0'
implementation 'se.alipsa.matrix:matrix-bigquery:0.6.0'
implementation 'org.apache.groovy:groovy:5.0.5'
implementation platform('se.alipsa.matrix:matrix-bom:2.4.0')
implementation 'se.alipsa.matrix:matrix-core'
implementation 'se.alipsa.matrix:matrix-bigquery'
```

### Maven

```xml
<dependencies>
<dependency>
<groupId>org.apache.groovy</groupId>
<artifactId>groovy</artifactId>
<version>5.0.4</version>
</dependency>
<dependency>
<groupId>se.alipsa.matrix</groupId>
<artifactId>matrix-core</artifactId>
<version>3.6.0</version>
</dependency>
<dependency>
<groupId>se.alipsa.matrix</groupId>
<artifactId>matrix-bigquery</artifactId>
<version>0.6.0</version>
</dependency>
</dependencies>
<project>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>se.alipsa.matrix</groupId>
<artifactId>matrix-bom</artifactId>
<version>2.4.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.groovy</groupId>
<artifactId>groovy</artifactId>
<version>5.0.5</version>
</dependency>
<dependency>
<groupId>se.alipsa.matrix</groupId>
<artifactId>matrix-core</artifactId>
</dependency>
<dependency>
<groupId>se.alipsa.matrix</groupId>
<artifactId>matrix-bigquery</artifactId>
</dependency>
</dependencies>
</project>
```

## Authentication
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/13-matrix-charts.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ To use the matrix-charts module, add the following dependency to your project:
### Gradle Configuration

```groovy
implementation platform('se.alipsa.matrix:matrix-bom:2.2.0')
implementation platform('se.alipsa.matrix:matrix-bom:2.4.0')
implementation 'se.alipsa.matrix:charts'
implementation 'se.alipsa.matrix:core'
implementation 'se.alipsa.matrix:stats'
Expand Down
Loading
Loading