cluster¶
The functionality of this module is primarily exposed and bundled by the
Clustering class. An instance of this
class aggregates various types (defined in _types
here).
Go to:
Clustering¶
-
class
cnnclustering.cluster.Clustering(input_data=None, fitter=None, predictor=None, labels=None, alias: unicode = 'root', parent=None, **kwargs)¶ Represents a clustering endeavour
A clustering object is made by aggregation of all necessary parts to carry out a clustering of input data points.
- Keyword Arguments
input_data – Any object implementing the input data interface. Represents the data points to be clustered. If this is not a valid (registered) concrete implementation of
InputData, this invokes the creation of a clustering viaClusteringBuilder.fitter – Any object implementing the fitter interface. Executes the clustering procedure.
predictor – Any object implementing the predictor interface. Translates a clustering result to another
Clusteringobject with differentinput_data.labels – An instance of
Labelsholding cluster label assignments for points ininput_data. If this is not an instance of class:~cnnclustering._types.Labels, attempts a corresponding intialisation.alias – A descriptive string identifier associated with this clustering.
parent – An instance of
Clusteringof which this clustering is a child of.
Note
A clustering instance may also be created using the clustering builder
ClusteringBuilder, e.g. asclustering = ClusteringBuilder(data).build().-
property
children¶ Return a mapping of child cluster labels to
cnnclustering.cluster.Clusteringinstances representing the children of this clustering.
-
evaluate(self, ax=None, clusters: Optional[Container[int]] = None, original: bool = False, unicode plot_style: str = u'dots', parts: Optional[Tuple[Optional[int]]] = None, points: Optional[Tuple[Optional[int]]] = None, dim: Optional[Tuple[int, int]] = None, mask: Optional[Sequence[Union[bool, int]]] = None, ax_props: Optional[dict] = None, annotate: bool = True, unicode annotate_pos: str = u'mean', annotate_props: Optional[dict] = None, plot_props: Optional[dict] = None, plot_noise_props: Optional[dict] = None, hist_props: Optional[dict] = None, free_energy: bool = True)¶ Returns a 2D plot of an original data set or a cluster result
- Args: ax: The
Axesinstance to which to add the plot. If None, a newFigurewithAxeswill be created.- clusters:
Cluster numbers to include in the plot. If
None, consider all.- original:
Allows to plot the original data instead of a cluster result. Overrides
clusters. Will be consideredTrue, if no cluster result is present.- plot_style:
The kind of plotting method to use.
“dots”,
ax.plot()“scatter”,
ax.scatter()“contour”,
ax.contour()“contourf”,
ax.contourf()
- parts:
Use a slice (start, stop, stride) on the data parts before plotting. Will be applied before a slice on
points.- points:
Use a slice (start, stop, stride) on the data points before plotting.
- dim:
Use these two dimensions for plotting. If
None, uses (0, 1).- mask:
Sequence of boolean or integer values used for optional fancy indexing on the point data array. Note, that this is applied after regular slicing (e.g. via
points) and requires a copy of the indexed data (may be slow and memory intensive for big data sets).- annotate:
If there is a cluster result, plot the cluster numbers. Uses
annotate_posto determinte the position of the annotations.- annotate_pos:
Where to put the cluster number annotation. Can be one of:
“mean”, Use the cluster mean
“random”, Use a random point of the cluster
Alternatively a list of x, y positions can be passed to set a specific point for each cluster (Not yet implemented)
- annotate_props:
Dictionary of keyword arguments passed to
ax.annotate().- ax_props:
Dictionary of
axproperties to apply after plotting viaax.set(**ax_props)(). IfNone, uses defaults that can be also defined in the configuration file (Note yet implemented).- plot_props:
Dictionary of keyword arguments passed to various functions (
plot.plot_dots()etc.) with different meaning to format cluster plotting. IfNone, uses defaults that can be also defined in the configuration file (Note yet implemented).- plot_noise_props:
Like
plot_propsbut for formatting noise point plotting.- hist_props:
Dictionary of keyword arguments passed to functions that involve the computing of a histogram via
numpy.histogram2d.- free_energy:
If
True, converts computed histograms to pseudo free energy surfaces.
- Returns
Figure, Axes and a list of plotted elements
- Args: ax: The
-
fit(self, double radius_cutoff: float, cnn_cutoff: int, member_cutoff: int = None, max_clusters: int = None, cnn_offset: int = None, sort_by_size: bool = True, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False) → None¶ Execute clustering procedure
- Parameters
radius_cutoff – Neighbour search radius.
cnn_cutoff – Similarity criterion.
member_cutoff – Valid clusters need to have at least this many members. Passed on to
Labels.sort_by_size()ifsort_by_sizeisTrue. Has no effect otherwise and valid clusters have at least one member.max_clusters – Keep only the largest
max_clustersclusters. Passed on toLabels.sort_by_size()ifsort_by_sizeisTrue. Has no effect otherwise.cnn_offset – Exists for compatibility reasons and is substracted from
cnn_cutoff. Ifcnn_offset = 0, two points need to share at leastcnn_cutoffneighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included andcnn_cutoff = 2is equivalent tocnn_cutoff = 0in this version.sort_by_size – Weather to sort (and trim) the created
Labelsinstance. See alsoLabels.sort_by_size().info – Wether to modify
Labels.metainformation for this clustering.record – Wether to create a
Recordinstance for this clustering which is appended to theSummary.record_time – Wether to time clustering execution.
v – Be chatty.
purge – If True, force re-initialisation of cluster label assignments.
-
fit_hierarchical(self, radius_cutoff: Union[float, List[float]], cnn_cutoff: Union[int, List[int]], member_cutoff: int = None, max_clusters: int = None, cnn_offset: int = None)¶ Execute hierarchical clustering procedure
-
property
fitter¶
-
classmethod
get_builder_kwargs(cls)¶
-
get_child(self, label)¶
-
property
hierarchy_level¶ The level of this clustering in the hierarchical tree of clusterings (0 for the root instance).
-
info(self)¶
-
property
input_data¶
-
isolate(self, bool purge: bool = True, bool isolate_input_data: bool = True)¶ Create child clusterings from cluster labels
- Parameters
purge – If
True, creates a new mapping for the children of this clustering.isolate_input_data – If
True, attaches a subset of the input data of this clustering to the child.
-
property
labels¶ Direct access to
cnnclustering._types.Labels.labelsholding cluster label assignments for points ininput_data.
-
make_parameters(self, double radius_cutoff: float, cnn_cutoff: int, current_start: int) → Type[ClusterParameters]¶
-
pie(self, ax=None, pie_props=None)¶
-
predict(self, other: Type[u'Clustering'], double radius_cutoff: float, cnn_cutoff: int, clusters: Optional[Sequence[int]] = None, cnn_offset: Optional[int] = None, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False)¶ Execute prediction procedure
- Parameters
other –
cnnclustering.cluster.Clusteringinstance for which cluster labels should be predicted.radius_cutoff – Neighbour search radius.
cnn_cutoff – Similarity criterion.
cluster – Sequence of cluster labels that should be included in the prediction.
cnn_offset – Exists for compatibility reasons and is substracted from
cnn_cutoff. Ifcnn_offset = 0, two points need to share at leastcnn_cutoffneighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included andcnn_cutoff = 2is equivalent tocnn_cutoff = 0in this version.purge – If True, force re-initialisation of predicted cluster labels.
-
reel(self, depth: Optional[int] = None) → None¶ Wrap up label assignments of lower hierarchy levels
- Parameters
depth – How many lower levels to consider. If
None,all. (consider) –
-
summarize(self, ax=None, unicode quantity: str = u'execution_time', treat_nan: Optional[Any] = None, convert: Optional[Any] = None, ax_props: Optional[dict] = None, contour_props: Optional[dict] = None, unicode plot_style: str = u'contourf')¶ Generate a 2D plot of record values
Record values (“time”, “clusters”, “largest”, “noise”) are plotted against cluster parameters (radius cutoff r and cnn cutoff c).
- Parameters
ax – Matplotlib Axes to plot on. If
None, a new Figure with Axes will be created.quantity –
Record value to visualise:
”time”
”clusters”
”largest”
”noise”
treat_nan – If not
None, use this value to pad nan-values.ax_props – Used to style
ax.contour_props – Passed on to contour.
-
property
summary¶ Return an instance of
cnnclustering.cluster.Summarycollecting clustering results for this clustering.
-
to_nx_DiGraph(self, ignore=None)¶ Convert cluster hierarchy to networkx DiGraph
- Keyword Arguments
ignore – A set of label not to include into the graph. Use for example to exclude noise (label 0).
-
tree(self, ax=None, ignore=None, pos_props=None, draw_props=None)¶
-
trim_shrinking_leafs(self)¶
-
trim_trivial_leafs(self)¶ Scan cluster hierarchy for removable nodes
If the cluster label assignments on a clustering are all zero (noise), the clustering is considered trivial. In this case, the labels and children are reset to
None.
-
class
cnnclustering.cluster.ClusteringBuilder(data, preparation_hook=None, registered_recipe_key=None, clustering_type=None, alias=None, parent=None, **recipe)¶ Orchestrate correct initialisation of a clustering
- Parameters
data – Data that should be clustered in a format compatible with ‘input_data’ specified in the building
recipe. May go throughpreparation_hookto establish compatibility.- Keyword Arguments
preparation_hook – A function that takes input data as a single argument and returns the (optionally) reformatted data plus additional information (e.g. “meta”) in form of an argument tuple and a keyword argument dictionary that can be used to initialise an input data type. If
Noneuses_default_preparation_hook.recipe – Building instructions for a clustering initialisation. Should be a mapping of component keyword arguments to componenet type details.
-
aggregate_components(self)¶
-
build(self)¶ Initialise clustering with data and components
Records and summary¶
-
class
cnnclustering.cluster.Record(n_points=None, radius_cutoff=None, cnn_cutoff=None, member_cutoff=None, max_clusters=None, n_clusters=None, ratio_largest=None, ratio_noise=None, execution_time=None)¶ Cluster result container
cnnclustering.cluster.Recordinstances can created duringcnnclustering.cluster.Clustering.fit()and are collected incnnclustering.cluster.Summary.-
to_dict(self)¶
-
-
class
cnnclustering.cluster.Summary(iterable=None)¶ List like container for cluster results
Stores instances of
cnnclustering.cluster.Record.-
insert(self, index, item)¶
-
to_DataFrame(self)¶ Convert list of records to (typed)
pandas.DataFrame- Returns
pandas.DataFrame
-
-
cnnclustering.cluster.make_typed_DataFrame(columns, dtypes, content=None)¶ Construct
pandas.DataFramewith typed columns
-
cnnclustering.cluster.timed(function)¶ Decorator to measure execution time