torch_kmeans package
- class torch_kmeans.KMeans(init_method: str = 'rnd', num_init: int = 8, max_iter: int = 100, distance: ~torch_kmeans.utils.distances.BaseDistance = <class 'torch_kmeans.utils.distances.LpDistance'>, p_norm: int = 2, tol: float = 0.0001, normalize: ~typing.Optional[~typing.Union[str, bool]] = None, n_clusters: ~typing.Optional[int] = 8, verbose: bool = True, seed: ~typing.Optional[int] = 123, **kwargs)[source]
Bases:
Module
Implements k-means clustering in terms of pytorch tensor operations which can be run on GPU. Supports batches of instances for use in batched training (e.g. for neural networks).
- Partly based on ideas from:
- Parameters
init_method (str) – Method to initialize cluster centers [‘rnd’, ‘k-means++’] (default: ‘rnd’)
num_init (int) – Number of different initial starting configurations, i.e. different sets of initial centers (default: 8).
max_iter (int) – Maximum number of iterations (default: 100).
distance (BaseDistance) – batched distance evaluator (default: LpDistance).
p_norm (int) – norm for lp distance (default: 2).
tol (float) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. (default: 1e-4)
normalize (Optional[Union[str, bool]]) – String id of method to use to normalize input. one of [‘mean’, ‘minmax’, ‘unit’]. None to disable normalization. (default: None).
n_clusters (Optional[int]) – Default number of clusters to use if not provided in call (optional, default: 8).
verbose (bool) – Verbosity flag to print additional info (default: True).
seed (Optional[int]) – Seed to fix random state for randomized center inits (default: True).
**kwargs – additional key word arguments for the distance function.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- INIT_METHODS = ['rnd', 'k-means++']
- NORM_METHODS = ['mean', 'minmax', 'unit']
- property num_clusters: Union[int, Tensor, Any]
Number of clusters in fitted model. Returns a tensor with possibly different numbers of clusters per instance for whole batch.
- forward(x: Tensor, k: Optional[Union[LongTensor, Tensor, int]] = None, centers: Optional[Tensor] = None, **kwargs) ClusterResult [source]
torch.nn like forward pass.
- Parameters
x (Tensor) – input features/coordinates (BS, N, D)
k (Optional[Union[LongTensor, Tensor, int]]) – optional batch of (possibly different) numbers of clusters per instance (BS, )
centers (Optional[Tensor]) – optional batch of initial centers to use (BS, K, D)
**kwargs – additional kwargs for initialization or cluster procedure
- Returns
ClusterResult tuple
- Return type
- fit(x: Tensor, k: Optional[Union[LongTensor, Tensor, int]] = None, centers: Optional[Tensor] = None, **kwargs) Module [source]
Compute cluster centers and predict cluster index for each sample.
- Parameters
x (Tensor) – input features/coordinates (BS, N, D)
k (Optional[Union[LongTensor, Tensor, int]]) – optional batch of (possibly different) numbers of clusters per instance (BS, )
centers (Optional[Tensor]) – optional batch of initial centers to use (BS, K, D)
**kwargs – additional kwargs for initialization or cluster procedure
- Returns
KMeans model
- Return type
Module
- predict(x: Tensor, **kwargs) LongTensor [source]
Predict the closest cluster each sample in X belongs to.
- Parameters
x (Tensor) – input features/coordinates (BS, N, D)
**kwargs – additional kwargs for assignment procedure
- Returns
batch tensor of cluster labels for each sample (BS, N)
- Return type
LongTensor
- fit_predict(x: Tensor, k: Optional[Union[LongTensor, Tensor, int]] = None, centers: Optional[Tensor] = None, **kwargs) LongTensor [source]
Compute cluster centers and predict cluster index for each sample.
- Parameters
x (Tensor) – input features/coordinates (BS, N, D)
k (Optional[Union[LongTensor, Tensor, int]]) – optional batch of (possibly different) numbers of clusters per instance (BS, )
centers (Optional[Tensor]) – optional batch of initial centers to use (BS, K, D)
**kwargs – additional kwargs for initialization or cluster procedure
- Returns
batch tensor of cluster labels for each sample (BS, N)
- Return type
LongTensor
- class torch_kmeans.ConstrainedKMeans(init_method: str = 'rnd', num_init: int = 8, max_iter: int = 100, distance: ~torch_kmeans.utils.distances.BaseDistance = <class 'torch_kmeans.utils.distances.LpDistance'>, p_norm: int = 2, tol: float = 0.0001, n_clusters: ~typing.Optional[int] = 8, verbose: bool = True, seed: ~typing.Optional[int] = 123, n_priority_trials_before_fall_back: int = 5, raise_infeasible: bool = True, **kwargs)[source]
Bases:
KMeans
Implements constrained k-means clustering. Priority implementation is based on the method of
- Paper:
Geetha, S., G. Poonthalir, and P. T. Vanathi. “Improved k-means algorithm for capacitated clustering problem.” INFOCOMP Journal of Computer Science 8.4 (2009)
- Parameters
init_method (str) – Method to initialize cluster centers: [‘rnd’, ‘topk’, ‘k-means++’, ‘ckm++’] (default: ‘rnd’)
num_init (int) – Number of different initial starting configurations, i.e. different sets of initial centers (default: 8).
max_iter (int) – Maximum number of iterations (default: 100).
distance (BaseDistance) – batched distance evaluator (default: LpDistance).
p_norm (int) – norm for lp distance (default: 2).
tol (float) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. (default: 1e-4)
n_clusters (Optional[int]) – Default number of clusters to use if not provided in call (optional, default: 8).
verbose (bool) – Verbosity flag to print additional info (default: True).
seed (Optional[int]) – Seed to fix random state for randomized center inits (default: 123).
n_priority_trials_before_fall_back (int) – Number of trials trying to assign samples to constrained clusters based on priority values before falling back to assigning the node with the highest weight to a cluster which can still accommodate it or the dummy cluster otherwise. (default: 5)
raise_infeasible (bool) – if set to False, will only display a warning instead of raising an error (default: True)
**kwargs – additional key word arguments for the distance function.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- INIT_METHODS = ['rnd', 'k-means++', 'topk', 'ckm++']
- NORM_METHODS = []
- predict(x: Tensor, weights: Tensor, **kwargs) LongTensor [source]
Predict the closest cluster each sample in X belongs to.
- Parameters
x (Tensor) – input features/coordinates (BS, N, D)
weights (Tensor) – normalized weight for each sample (BS, N)
**kwargs – additional kwargs for assignment procedure
- Returns
batch tensor of cluster labels for each sample (BS, N)
- Return type
LongTensor
- class torch_kmeans.SoftKMeans(init_method: str = 'rnd', num_init: int = 1, max_iter: int = 100, distance: ~torch_kmeans.utils.distances.BaseDistance = <class 'torch_kmeans.utils.distances.CosineSimilarity'>, p_norm: int = 1, normalize: str = 'unit', tol: float = 1e-05, n_clusters: ~typing.Optional[int] = 8, verbose: bool = True, seed: ~typing.Optional[int] = 123, temp: float = 5.0, **kwargs)[source]
Bases:
KMeans
Implements differentiable soft k-means clustering. Method adapted from https://github.com/bwilder0/clusternet to support batches.
- Paper:
Wilder et al., “End to End Learning and Optimization on Graphs” (NeurIPS’2019)
- Parameters
init_method (str) – Method to initialize cluster centers: [‘rnd’, ‘topk’] (default: ‘rnd’)
num_init (int) – Number of different initial starting configurations, i.e. different sets of initial centers. If >1 selects the best configuration before propagating through fixpoint (default: 1).
max_iter (int) – Maximum number of iterations (default: 100).
distance (BaseDistance) – batched distance evaluator (default: CosineSimilarity).
p_norm (int) – norm for lp distance (default: 1).
normalize (str) – id of method to use to normalize input. (default: ‘unit’).
tol (float) – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. (default: 1e-4)
n_clusters (Optional[int]) – Default number of clusters to use if not provided in call (optional, default: 8).
verbose (bool) – Verbosity flag to print additional info (default: True).
seed (Optional[int]) – Seed to fix random state for randomized center inits (default: True).
temp (float) – temperature for soft cluster assignments (default: 5.0).
**kwargs – additional key word arguments for the distance function.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class torch_kmeans.LpDistance(**kwargs)[source]
Bases:
BaseDistance
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- compute_mat(query_emb: Tensor, ref_emb: Optional[Tensor] = None) Tensor [source]
Compute the batched p-norm distance between each pair of the two collections of row vectors.
- Parameters
query_emb (Tensor) –
ref_emb (Optional[Tensor]) –
- Return type
Tensor
- class torch_kmeans.DotProductSimilarity(**kwargs)[source]
Bases:
BaseDistance
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- compute_mat(query_emb: Tensor, ref_emb: Tensor) Tensor [source]
- Parameters
query_emb (Tensor) –
ref_emb (Tensor) –
- Return type
Tensor
- class torch_kmeans.CosineSimilarity(**kwargs)[source]
Bases:
DotProductSimilarity
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class torch_kmeans.ClusterResult(labels: LongTensor, centers: Tensor, inertia: Tensor, x_org: Tensor, x_norm: Tensor, k: LongTensor, soft_assignment: Optional[Tensor] = None)[source]
Bases:
tuple
Named and typed result tuple for kmeans algorithms
- Parameters
labels (LongTensor) – label for each sample in x
centers (Tensor) – corresponding coordinates of cluster centers
inertia (Tensor) – sum of squared distances of samples to their closest cluster center
x_org (Tensor) – original x
x_norm (Tensor) – normalized x which was used for cluster centers and labels
k (LongTensor) – number of clusters
soft_assignment (Optional[Tensor]) – assignment probabilities of soft kmeans
Create new instance of ClusterResult(labels, centers, inertia, x_org, x_norm, k, soft_assignment)
- labels: LongTensor
Alias for field number 0
- centers: Tensor
Alias for field number 1
- inertia: Tensor
Alias for field number 2
- x_org: Tensor
Alias for field number 3
- x_norm: Tensor
Alias for field number 4
- k: LongTensor
Alias for field number 5