Nonparametric-Bayesian-Multi-Facet-Clustering-for-Longitudinal-Data

Code for UAI paper: Nonparametric Bayesian Multi-Facet Clustering for Longitudinal Data

Author: Luwei Wang

We implemented two example time series models for multi-facet clustering:

Nonlinear growth model (single-dimensional, B-spline-based, MFNLG, allows missing values)
- Facets: intercept at time T; shape; noise
Vector autoregressive model (multi-dimensional, one lag, MFVAR, allows varying time series lengths)
- Facets: intercept vector; transition matrix (interaction); noise vector

Function descriptions

.MFNLG() and .MFVAR(): create an initial model with data information and specific time series model parameters.
.initialize(): initialize model parameters and change hyperparameters of priors through a dictionary.
.fit(): start fitting by mean-field Variational Inference.
- We recommend running the model fitting through multiple parallel runs to avoid getting stuck in the bad local optima.
- prune_threshold should not be greater than 1/trunc_level. trunc_level controls the maximum possible number of clusters and also affects the training speed. prune_threshold affects the learned number of clusters, and can be specified for each facet separately.
The fitted model only returns important hyperparameters, ELBOs and training time. If you want to get the estimated parameters, call .getEstimates(). If you want to get the estimated posterior probability matrix for each individual, call .getPiMat().
There are built-in print and plot functions to print fitting results (with ground truth if provided) .printRes(), visualise facets .plotFacet() and show facet cluster assignments .showClust(). You can also define your own print and plot functions.
.save(): save the model

Example code for running the method

import numpy as np
from NPBayesMFVI import MFNLG, MFVAR
from joblib import Parallel, delayed

def runParal(s): # for parallel runs
  model = MFVAR(data=data, trunc_level = 10, seed=s)
  # model = MFNLG(data=data, trunc_level = 10, inter_knots=int_knots, seed=s)
  model.initialize()
  model.fit(maxIter = 300, prune_threshold=0.01)
  return model

data = (numpy array in N*T or N*D*T) # for MFNLG or MFVAR model
seeds = [] # generate random seeds to use for parallel runs
num_cores = 50
paral_res = Parallel(n_jobs=num_cores)(delayed(runParal)(s) for s in seeds)
# choose the highest ELBO
ELBOs = [(i,np.round(paral_res[i].ELBO_iters[-1],2)) for i in range(len(paral_res))]
print('Optimal ELBOs of diff models:\n', ELBOs)
model_idx = np.nanargmax(ELBOs, axis=0)[1]
model = paral_res[model_idx]
print('Best model at index:', model_idx, "with ELBO:", np.round(model.ELBO_iters[-1],2),"; Model seed:", model.seed)
model.printRes() # print estimated results and sd; can also print ground truth if provided (a,pi_a,beta/Beta,pi_beta/pi_Beta,sigma,pi_sigma)
model.plotFacet()
model.showClust() # plot contingency table of cluster assignments for intercepts and coefficients
model.getEstimates() # get all estimates for customized analysis and plots
model.save(filePath)

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.gitignore		.gitignore
LICENSE		LICENSE
NPBayesMFVI.py		NPBayesMFVI.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nonparametric-Bayesian-Multi-Facet-Clustering-for-Longitudinal-Data

Function descriptions

Example code for running the method

About

Uh oh!

Releases

Packages

Languages

License

Demi-wlw/Nonparametric-Bayesian-Multi-Facet-Clustering-for-Longitudinal-Data

Folders and files

Latest commit

History

Repository files navigation

Nonparametric-Bayesian-Multi-Facet-Clustering-for-Longitudinal-Data

Function descriptions

Example code for running the method

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages