cluster¶
The functionality of this module is primarily exposed and bundled by the
Clustering
class. An instance of this
class aggregates various types (defined in _types
here).
Go to:
Clustering¶
-
class
cnnclustering.cluster.
Clustering
(input_data=None, fitter=None, predictor=None, labels=None, alias: unicode = 'root', parent=None, **kwargs)¶ Represents a clustering endeavour
A clustering object is made by aggregation of all necessary parts to carry out a clustering of input data points.
- Keyword Arguments
input_data – Any object implementing the input data interface. Represents the data points to be clustered. If this is not a valid (registered) concrete implementation of
InputData
, this invokes the creation of a clustering viaClusteringBuilder
.fitter – Any object implementing the fitter interface. Executes the clustering procedure.
predictor – Any object implementing the predictor interface. Translates a clustering result to another
Clustering
object with differentinput_data
.labels – An instance of
Labels
holding cluster label assignments for points ininput_data
. If this is not an instance of class:~cnnclustering._types.Labels
, attempts a corresponding intialisation.alias – A descriptive string identifier associated with this clustering.
parent – An instance of
Clustering
of which this clustering is a child of.
Note
A clustering instance may also be created using the clustering builder
ClusteringBuilder
, e.g. asclustering = ClusteringBuilder(data).build()
.-
property
children
¶ Return a mapping of child cluster labels to
cnnclustering.cluster.Clustering
instances representing the children of this clustering.
-
evaluate
(self, ax=None, clusters: Optional[Container[int]] = None, original: bool = False, unicode plot_style: str = u'dots', parts: Optional[Tuple[Optional[int]]] = None, points: Optional[Tuple[Optional[int]]] = None, dim: Optional[Tuple[int, int]] = None, mask: Optional[Sequence[Union[bool, int]]] = None, ax_props: Optional[dict] = None, annotate: bool = True, unicode annotate_pos: str = u'mean', annotate_props: Optional[dict] = None, plot_props: Optional[dict] = None, plot_noise_props: Optional[dict] = None, hist_props: Optional[dict] = None, free_energy: bool = True)¶ Returns a 2D plot of an original data set or a cluster result
- Args: ax: The
Axes
instance to which to add the plot. If None
, a newFigure
withAxes
will be created.- clusters:
Cluster numbers to include in the plot. If
None
, consider all.- original:
Allows to plot the original data instead of a cluster result. Overrides
clusters
. Will be consideredTrue
, if no cluster result is present.- plot_style:
The kind of plotting method to use.
“dots”,
ax.plot()
“scatter”,
ax.scatter()
“contour”,
ax.contour()
“contourf”,
ax.contourf()
- parts:
Use a slice (start, stop, stride) on the data parts before plotting. Will be applied before a slice on
points
.- points:
Use a slice (start, stop, stride) on the data points before plotting.
- dim:
Use these two dimensions for plotting. If
None
, uses (0, 1).- mask:
Sequence of boolean or integer values used for optional fancy indexing on the point data array. Note, that this is applied after regular slicing (e.g. via
points
) and requires a copy of the indexed data (may be slow and memory intensive for big data sets).- annotate:
If there is a cluster result, plot the cluster numbers. Uses
annotate_pos
to determinte the position of the annotations.- annotate_pos:
Where to put the cluster number annotation. Can be one of:
“mean”, Use the cluster mean
“random”, Use a random point of the cluster
Alternatively a list of x, y positions can be passed to set a specific point for each cluster (Not yet implemented)
- annotate_props:
Dictionary of keyword arguments passed to
ax.annotate()
.- ax_props:
Dictionary of
ax
properties to apply after plotting viaax.set(**ax_props)()
. IfNone
, uses defaults that can be also defined in the configuration file (Note yet implemented).- plot_props:
Dictionary of keyword arguments passed to various functions (
plot.plot_dots()
etc.) with different meaning to format cluster plotting. IfNone
, uses defaults that can be also defined in the configuration file (Note yet implemented).- plot_noise_props:
Like
plot_props
but for formatting noise point plotting.- hist_props:
Dictionary of keyword arguments passed to functions that involve the computing of a histogram via
numpy.histogram2d
.- free_energy:
If
True
, converts computed histograms to pseudo free energy surfaces.
- Returns
Figure, Axes and a list of plotted elements
- Args: ax: The
-
fit
(self, double radius_cutoff: float, cnn_cutoff: int, member_cutoff: int = None, max_clusters: int = None, cnn_offset: int = None, sort_by_size: bool = True, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False) → None¶ Execute clustering procedure
- Parameters
radius_cutoff – Neighbour search radius.
cnn_cutoff – Similarity criterion.
member_cutoff – Valid clusters need to have at least this many members. Passed on to
Labels.sort_by_size()
ifsort_by_size
isTrue
. Has no effect otherwise and valid clusters have at least one member.max_clusters – Keep only the largest
max_clusters
clusters. Passed on toLabels.sort_by_size()
ifsort_by_size
isTrue
. Has no effect otherwise.cnn_offset – Exists for compatibility reasons and is substracted from
cnn_cutoff
. Ifcnn_offset = 0
, two points need to share at leastcnn_cutoff
neighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included andcnn_cutoff = 2
is equivalent tocnn_cutoff = 0
in this version.sort_by_size – Weather to sort (and trim) the created
Labels
instance. See alsoLabels.sort_by_size()
.info – Wether to modify
Labels.meta
information for this clustering.record – Wether to create a
Record
instance for this clustering which is appended to theSummary
.record_time – Wether to time clustering execution.
v – Be chatty.
purge – If True, force re-initialisation of cluster label assignments.
-
fit_hierarchical
(self, radius_cutoff: Union[float, List[float]], cnn_cutoff: Union[int, List[int]], member_cutoff: int = None, max_clusters: int = None, cnn_offset: int = None)¶ Execute hierarchical clustering procedure
-
property
fitter
¶
-
classmethod
get_builder_kwargs
(cls)¶
-
get_child
(self, label)¶
-
property
hierarchy_level
¶ The level of this clustering in the hierarchical tree of clusterings (0 for the root instance).
-
info
(self)¶
-
property
input_data
¶
-
isolate
(self, bool purge: bool = True, bool isolate_input_data: bool = True)¶ Create child clusterings from cluster labels
- Parameters
purge – If
True
, creates a new mapping for the children of this clustering.isolate_input_data – If
True
, attaches a subset of the input data of this clustering to the child.
-
property
labels
¶ Direct access to
cnnclustering._types.Labels.labels
holding cluster label assignments for points ininput_data
.
-
make_parameters
(self, double radius_cutoff: float, cnn_cutoff: int, current_start: int) → Type[ClusterParameters]¶
-
pie
(self, ax=None, pie_props=None)¶
-
predict
(self, other: Type[u'Clustering'], double radius_cutoff: float, cnn_cutoff: int, clusters: Optional[Sequence[int]] = None, cnn_offset: Optional[int] = None, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False)¶ Execute prediction procedure
- Parameters
other –
cnnclustering.cluster.Clustering
instance for which cluster labels should be predicted.radius_cutoff – Neighbour search radius.
cnn_cutoff – Similarity criterion.
cluster – Sequence of cluster labels that should be included in the prediction.
cnn_offset – Exists for compatibility reasons and is substracted from
cnn_cutoff
. Ifcnn_offset = 0
, two points need to share at leastcnn_cutoff
neighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included andcnn_cutoff = 2
is equivalent tocnn_cutoff = 0
in this version.purge – If True, force re-initialisation of predicted cluster labels.
-
reel
(self, depth: Optional[int] = None) → None¶ Wrap up label assignments of lower hierarchy levels
- Parameters
depth – How many lower levels to consider. If
None
,all. (consider) –
-
summarize
(self, ax=None, unicode quantity: str = u'execution_time', treat_nan: Optional[Any] = None, convert: Optional[Any] = None, ax_props: Optional[dict] = None, contour_props: Optional[dict] = None, unicode plot_style: str = u'contourf')¶ Generate a 2D plot of record values
Record values (“time”, “clusters”, “largest”, “noise”) are plotted against cluster parameters (radius cutoff r and cnn cutoff c).
- Parameters
ax – Matplotlib Axes to plot on. If
None
, a new Figure with Axes will be created.quantity –
Record value to visualise:
”time”
”clusters”
”largest”
”noise”
treat_nan – If not
None
, use this value to pad nan-values.ax_props – Used to style
ax
.contour_props – Passed on to contour.
-
property
summary
¶ Return an instance of
cnnclustering.cluster.Summary
collecting clustering results for this clustering.
-
to_nx_DiGraph
(self, ignore=None)¶ Convert cluster hierarchy to networkx DiGraph
- Keyword Arguments
ignore – A set of label not to include into the graph. Use for example to exclude noise (label 0).
-
tree
(self, ax=None, ignore=None, pos_props=None, draw_props=None)¶
-
trim_shrinking_leafs
(self)¶
-
trim_trivial_leafs
(self)¶ Scan cluster hierarchy for removable nodes
If the cluster label assignments on a clustering are all zero (noise), the clustering is considered trivial. In this case, the labels and children are reset to
None
.
-
class
cnnclustering.cluster.
ClusteringBuilder
(data, preparation_hook=None, registered_recipe_key=None, clustering_type=None, alias=None, parent=None, **recipe)¶ Orchestrate correct initialisation of a clustering
- Parameters
data – Data that should be clustered in a format compatible with ‘input_data’ specified in the building
recipe
. May go throughpreparation_hook
to establish compatibility.- Keyword Arguments
preparation_hook – A function that takes input data as a single argument and returns the (optionally) reformatted data plus additional information (e.g. “meta”) in form of an argument tuple and a keyword argument dictionary that can be used to initialise an input data type. If
None
uses_default_preparation_hook
.recipe – Building instructions for a clustering initialisation. Should be a mapping of component keyword arguments to componenet type details.
-
aggregate_components
(self)¶
-
build
(self)¶ Initialise clustering with data and components
Records and summary¶
-
class
cnnclustering.cluster.
Record
(n_points=None, radius_cutoff=None, cnn_cutoff=None, member_cutoff=None, max_clusters=None, n_clusters=None, ratio_largest=None, ratio_noise=None, execution_time=None)¶ Cluster result container
cnnclustering.cluster.Record
instances can created duringcnnclustering.cluster.Clustering.fit()
and are collected incnnclustering.cluster.Summary
.-
to_dict
(self)¶
-
-
class
cnnclustering.cluster.
Summary
(iterable=None)¶ List like container for cluster results
Stores instances of
cnnclustering.cluster.Record
.-
insert
(self, index, item)¶
-
to_DataFrame
(self)¶ Convert list of records to (typed)
pandas.DataFrame
- Returns
pandas.DataFrame
-
-
cnnclustering.cluster.
make_typed_DataFrame
(columns, dtypes, content=None)¶ Construct
pandas.DataFrame
with typed columns
-
cnnclustering.cluster.
timed
(function)¶ Decorator to measure execution time