sm00thix
diff --git a/‎README.md‎
Lines changed: 25 additions & 13 deletions b/‎README.md‎
Lines changed: 25 additions & 13 deletions
diff --git a/‎benchmarks/benchmark.py‎
Lines changed: 48 additions & 48 deletions b/‎benchmarks/benchmark.py‎
Lines changed: 48 additions & 48 deletions
diff --git a/‎cvmatrix/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎cvmatrix/__init__.py‎
Lines changed: 1 addition & 1 deletion
@@ -14,9 +14,12 @@
 
 [![Package Status](https://github.com/Sm00thix/CVMatrix/actions/workflows/package_workflow.yml/badge.svg)](https://github.com/Sm00thix/CVMatrix/actions/workflows/package_workflow.yml)
 
-The [`cvmatrix`](https://pypi.org/project/cvmatrix/) package implements the fast cross-validation algorithms by Engstrøm [[1]](#references) for computation of training set $\mathbf{X}^{\mathbf{T}}\mathbf{X}$ and $\mathbf{X}^{\mathbf{T}}\mathbf{Y}$ in a cross-validation setting. In addition to correctly handling arbitrary row-wise pre-processing, the algorithms allow for and efficiently and correctly handle any combination of column-wise centering and scaling of `X` and `Y` based on training set statistical moments.
+The [`cvmatrix`](https://pypi.org/project/cvmatrix/) package implements the fast cross-validation algorithms by Engstrøm and Jensen [[1]](#references) for computation of training set $\mathbf{X}^{\mathbf{T}}\mathbf{X}$ and $\mathbf{X}^{\mathbf{T}}\mathbf{Y}$ in a cross-validation setting. In addition to correctly handling arbitrary row-wise pre-processing, the algorithms allow for and efficiently and correctly handle any combination of column-wise centering and scaling of `X` and `Y` based on training set statistical moments.
 
-For an implementation of the fast cross-validation algorithms combined with Improved Kernel Partial Least Squares [[2]](#references), see the Python package [`ikpls`](https://pypi.org/project/ikpls/).
+For an implementation of the fast cross-validation algorithms combined with Improved Kernel Partial Least Squares [[2]](#references), see the Python package [`ikpls`](https://pypi.org/project/ikpls/) by Engstrøm et al. [[3]](#references).
+
+## NEW IN 2.0.0: Weighted CVMatrix
+The `cvmatrix` software package now also features **weigthed matrix produts** $\mathbf{X}^{\mathbf{T}}\mathbf{W}\mathbf{Y}$ **without increasing time or space complexity compared to the unweighted case**. This is due to a generalization of the algorithms by Engstrøm and Jensen [[1]](#references). A new article formally describing the generalization is to be announced.
 
 ## Installation
 
@@ -46,23 +49,27 @@ For an implementation of the fast cross-validation algorithms combined with Impr
 > Y = np.random.uniform(size=(N, M)) # Random Y data
 > folds = np.arange(100) % 5 # 5-fold cross-validation
 >
+> # Weights must be non-negative and the sum of weights for any training partition must
+> # be greater than zero.
+> weights = np.random.uniform(size=(N,)) + 0.1
+>
 > # Instantiate CVMatrix
 > cvm = CVMatrix(
 >     folds=folds,
->     center_X=True,
->     center_Y=True,
->     scale_X=True,
->     scale_Y=True,
+>     center_X=True, # Cemter around the weighted mean of X.
+>     center_Y=True, # Cemter around the weighted mean of Y.
+>     scale_X=True, # Scale by the weighted standard deviation of X.
+>     scale_Y=True, # Scale by the weighted standard deviation of Y.
 > )
 > # Fit on X and Y
-> cvm.fit(X=X, Y=Y)
-> # Compute training set XTX and/or XTY for each fold
+> cvm.fit(X=X, Y=Y, weights=weights)
+> # Compute training set XTWX and/or XTWY for each fold
 > for fold in cvm.folds_dict.keys():
->     # Get both XTX and XTY
+>     # Get both XTWX and XTWY
 >     training_XTX, training_XTY = cvm.training_XTX_XTY(fold)
->     # Get only XTX
+>     # Get only XTWX
 >     training_XTX = cvm.training_XTX(fold)
->     # Get only XTY
+>     # Get only XTWY
 >     training_XTY = cvm.training_XTY(fold)
 
 ### Examples
@@ -87,5 +94,10 @@ Guidelines](https://github.com/Sm00thix/CVMatrix/blob/main/CONTRIBUTING.md).
 
 ## References
 
-1. [Engstrøm, O.-C. G. (2024). Shortcutting Cross-Validation: Efficiently Deriving Column-Wise Centered and Scaled Training Set $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ Without Full Recomputation of Matrix Products or Statistical Moments](https://arxiv.org/abs/2401.13185)
-2. [Dayal, B. S., & MacGregor, J. F. (1997). Improved PLS algorithms. *Journal of Chemometrics*, 11(1), 73-85.](https://doi.org/10.1002/(SICI)1099-128X(199701)11:1%3C73::AID-CEM435%3E3.0.CO;2-%23?)
+1. [Engstrøm, O.-C. G. and Jensen, M. H. (2025). Fast partition-based cross-validation with centering and scaling for $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$. *Journal of Chemometrics*, 39(3).](https://doi.org/10.1002/cem.70008)
+2. [Dayal, B. S. and MacGregor, J. F. (1997). Improved PLS algorithms. *Journal of Chemometrics*, 11(1), 73-85.](https://doi.org/10.1002/(SICI)1099-128X(199701)11:1%3C73::AID-CEM435%3E3.0.CO;2-%23?)
+3. [Engstrøm, O.-C. G. and Dreier, E. S. and Jespersen, B. M. and Pedersen, K. S. IKPLS: Improved Kernel Partial Least Squares and Fast Cross-Validation Algorithms for Python with CPU and GPU Implementations Using NumPy and JAX. *Journal of Open Source Software*, 9(99).](https://doi.org/10.21105/joss.06533)
+
+## Funding
+- Up until May 31st 2025, this work has been carried out as part of an industrial Ph. D. project receiving funding from [FOSS Analytical A/S](https://www.fossanalytics.com/) and [The Innovation Fund Denmark](https://innovationsfonden.dk/en). Grant number 1044-00108B.
+- From June 1st 2025 and onward, this work is sponsored by [FOSS Analytical A/S](https://www.fossanalytics.com/).
@@ -10,6 +10,7 @@
 Author: Ole-Christian Galbo Engstrøm
 E-mail: ole.e@di.ku.dk
 """
+
 import os
 import sys
 
@@ -31,42 +32,32 @@
 
 
 def save_result_to_csv(
-        model,
-        P,
-        N,
-        K,
-        M,
-        center_X,
-        center_Y,
-        scale_X,
-        scale_Y,
-        time,
-        version
-    ):
+    model, P, N, K, M, center_X, center_Y, scale_X, scale_Y, time, version
+):
     try:
         with open("benchmark_results.csv", "x") as f:
-            f.write(
-                "model,P,N,K,M,"
-                "center_X,center_Y,scale_X,scale_Y,time,version\n"
-            )
+            f.write("model,P,N,K,M," "center_X,center_Y,scale_X,scale_Y,time,version\n")
     except FileExistsError:
         pass
     with open("benchmark_results.csv", "a") as f:
         f.write(
             f"{model},{P},{N},{K},{M},"
             f"{center_X},{center_Y},{scale_X},{scale_Y},"
-            f"{time},{version}\n")
+            f"{time},{version}\n"
+        )
+
 
 def execute_algorithm(
-        model_class: Union[NaiveCVMatrix, CVMatrix],
-        cv_splits: Iterable[Hashable],
-        center_X: bool,
-        center_Y: bool,
-        scale_X: bool,
-        scale_Y: bool,
-        X: np.ndarray,
-        Y: np.ndarray,
-    ):
+    model_class: Union[NaiveCVMatrix, CVMatrix],
+    cv_splits: Iterable[Hashable],
+    center_X: bool,
+    center_Y: bool,
+    scale_X: bool,
+    scale_Y: bool,
+    X: np.ndarray,
+    Y: np.ndarray,
+    weights: Union[None, np.ndarray],
+):
     """
     Execute the computation of the training set matrices
     :math:`\mathbf{X}^{\mathbf{T}}\mathbf{X}`
@@ -80,7 +71,7 @@ def execute_algorithm(
 
     cv_splits : Iterable[Hashable]
         The cross-validation splits.
-    
+
     center_X : bool
         Whether to center `X`.
 
@@ -98,11 +89,14 @@ def execute_algorithm(
 
     Y : np.ndarray
         The target matrix with shape (N, M).
+
+    weights : Union[None, np.ndarray]
+        The weights for the samples, if any. If None, no weights are used.
     """
 
     # Create the model
     model = model_class(
-        cv_splits=cv_splits,
+        folds=cv_splits,
         center_X=center_X,
         center_Y=center_Y,
         scale_X=scale_X,
@@ -112,31 +106,34 @@ def execute_algorithm(
     )
 
     # Fit the model
-    model.fit(X, Y)
+    model.fit(X, Y, weights)
 
     # Compute the training set matrices
-    for fold in model.val_folds_dict.keys():
+    for fold in model.folds_dict.keys():
         model.training_XTX_XTY(fold)
 
-if __name__ == '__main__':
-    seed = 42 # Seed for reproducibility
+
+if __name__ == "__main__":
+    seed = 42  # Seed for reproducibility
     rng = np.random.default_rng(seed=seed)
-    N = 100000 # 100k samples
-    K = 500 # 500 features
-    M = 10 # 10 targets
-    dtype = np.float64 # Data type
-    X = rng.random((N, K), dtype=dtype) # Random X matrix
-    Y = rng.random((N, M), dtype=dtype) # Random Y matrix
-    cv_splits = np.arange(N) # We can use mod P for P-fold cross-validation
+    N = 100000  # 100k samples
+    K = 500  # 500 features
+    M = 10  # 10 targets
+    dtype = np.float64  # Data type
+    X = rng.random((N, K), dtype=dtype)  # Random X matrix
+    Y = rng.random((N, M), dtype=dtype)  # Random Y matrix
+    # weights = rng.random((N,), dtype=dtype)  # Random weights
+    weights = None
+    cv_splits = np.arange(N)  # We can use mod P for P-fold cross-validation
     center_Xs = [True, False]
     center_Ys = [True, False]
     scale_Xs = [True, False]
     scale_Ys = [True, False]
     Ps = [3, 5, 10, 100, 1000, 10000, 100000]
 
     for center_X, center_Y, scale_X, scale_Y, P in product(
-            center_Xs, center_Ys, scale_Xs, scale_Ys, Ps
-        ):
+        center_Xs, center_Ys, scale_Xs, scale_Ys, Ps
+    ):
         print(
             f"P={P}, "
             f"center_X={center_X}, center_Y={center_Y}, "
@@ -152,8 +149,9 @@ def execute_algorithm(
                 scale_Y=scale_Y,
                 X=X,
                 Y=Y,
+                weights=weights,
             ),
-            number=1
+            number=1,
         )
         print(f"CVMatrix, Time: {time:.2f} seconds")
         save_result_to_csv(
@@ -167,12 +165,13 @@ def execute_algorithm(
             scale_X,
             scale_Y,
             time,
-            __version__
+            __version__,
         )
 
-        if (center_X == center_Y == scale_X == scale_Y or
-            center_X == center_Y == True and
-            scale_X == scale_Y == False
+        if (
+            center_X == center_Y == scale_X == scale_Y
+            or center_X == center_Y == True
+            and scale_X == scale_Y == False
         ):
             time = timeit(
                 stmt=lambda: execute_algorithm(
@@ -184,8 +183,9 @@ def execute_algorithm(
                     scale_Y=scale_Y,
                     X=X,
                     Y=Y,
+                    weights=weights,
                 ),
-                number=1
+                number=1,
             )
             print(f"NaiveCVMatrix, Time: {time:.2f} seconds")
             print()
@@ -200,5 +200,5 @@ def execute_algorithm(
                 scale_X,
                 scale_Y,
                 time,
-                __version__
+                __version__,
             )
@@ -1 +1 @@
-__version__ = '1.0.2.post2'
+__version__ = "2.0.0"
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = '1.0.2.post2'`
	`1`	`+__version__ = "2.0.0"`