cluster¶

The functionality of this module is primarily exposed and bundled by the Clustering class. An instance of this class aggregates various types (defined in _types here).

Go to:

Clustering

Records and summary

Clustering¶

class cnnclustering.cluster.Clustering(input_data=None, fitter=None, predictor=None, labels=None, alias: unicode = 'root', parent=None, **kwargs)¶

Represents a clustering endeavour

A clustering object is made by aggregation of all necessary parts to carry out a clustering of input data points.

Keyword Arguments

input_data – Any object implementing the input data interface. Represents the data points to be clustered. If this is not a valid (registered) concrete implementation of InputData, this invokes the creation of a clustering via ClusteringBuilder.
fitter – Any object implementing the fitter interface. Executes the clustering procedure.
predictor – Any object implementing the predictor interface. Translates a clustering result to another Clustering object with different input_data.
labels – An instance of Labels holding cluster label assignments for points in input_data. If this is not an instance of class:~cnnclustering._types.Labels, attempts a corresponding intialisation.
alias – A descriptive string identifier associated with this clustering.
parent – An instance of Clustering of which this clustering is a child of.

Note

A clustering instance may also be created using the clustering builder ClusteringBuilder, e.g. as

clustering = ClusteringBuilder(data).build().

property children¶: Return a mapping of child cluster labels to cnnclustering.cluster.Clustering instances representing the children of this clustering.

evaluate(self, ax=None, clusters: Optional[Container[int]] = None, original: bool = False, unicode plot_style: str = u'dots', parts: Optional[Tuple[Optional[int]]] = None, points: Optional[Tuple[Optional[int]]] = None, dim: Optional[Tuple[int, int]] = None, mask: Optional[Sequence[Union[bool, int]]] = None, ax_props: Optional[dict] = None, annotate: bool = True, unicode annotate_pos: str = u'mean', annotate_props: Optional[dict] = None, plot_props: Optional[dict] = None, plot_noise_props: Optional[dict] = None, hist_props: Optional[dict] = None, free_energy: bool = True)¶

Returns a 2D plot of an original data set or a cluster result

Args: ax: The Axes instance to which to add the plot. If

None, a new Figure with Axes will be created.

clusters:

Cluster numbers to include in the plot. If None, consider all.

original:

Allows to plot the original data instead of a cluster result. Overrides clusters. Will be considered True, if no cluster result is present.

plot_style:

The kind of plotting method to use.

“dots”, ax.plot()

“scatter”, ax.scatter()

“contour”, ax.contour()

“contourf”, ax.contourf()

parts:

Use a slice (start, stop, stride) on the data parts before plotting. Will be applied before a slice on points.

points:

Use a slice (start, stop, stride) on the data points before plotting.

dim:

Use these two dimensions for plotting. If None, uses (0, 1).

mask:

Sequence of boolean or integer values used for optional fancy indexing on the point data array. Note, that this is applied after regular slicing (e.g. via points) and requires a copy of the indexed data (may be slow and memory intensive for big data sets).

annotate:

If there is a cluster result, plot the cluster numbers. Uses annotate_pos to determinte the position of the annotations.

annotate_pos:

Where to put the cluster number annotation. Can be one of:

“mean”, Use the cluster mean

“random”, Use a random point of the cluster

Alternatively a list of x, y positions can be passed to set a specific point for each cluster (Not yet implemented)

annotate_props:

Dictionary of keyword arguments passed to ax.annotate().

ax_props:

Dictionary of ax properties to apply after plotting via ax.set(**ax_props)(). If None, uses defaults that can be also defined in the configuration file (Note yet implemented).

plot_props:

Dictionary of keyword arguments passed to various functions (plot.plot_dots() etc.) with different meaning to format cluster plotting. If None, uses defaults that can be also defined in the configuration file (Note yet implemented).

plot_noise_props:

Like plot_props but for formatting noise point plotting.

hist_props:

Dictionary of keyword arguments passed to functions that involve the computing of a histogram via numpy.histogram2d.

free_energy:

If True, converts computed histograms to pseudo free energy surfaces.

Returns: Figure, Axes and a list of plotted elements

fit(self, double radius_cutoff: float, cnn_cutoff: int, member_cutoff: int = None, max_clusters: int = None, cnn_offset: int = None, sort_by_size: bool = True, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False) → None¶

Execute clustering procedure

Parameters

radius_cutoff – Neighbour search radius.
cnn_cutoff – Similarity criterion.
member_cutoff – Valid clusters need to have at least this many members. Passed on to Labels.sort_by_size() if sort_by_size is True. Has no effect otherwise and valid clusters have at least one member.
max_clusters – Keep only the largest max_clusters clusters. Passed on to Labels.sort_by_size() if sort_by_size is True. Has no effect otherwise.
cnn_offset – Exists for compatibility reasons and is substracted from cnn_cutoff. If cnn_offset = 0, two points need to share at least cnn_cutoff neighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included and cnn_cutoff = 2 is equivalent to cnn_cutoff = 0 in this version.
sort_by_size – Weather to sort (and trim) the created Labels instance. See also Labels.sort_by_size().
info – Wether to modify Labels.meta information for this clustering.
record – Wether to create a Record instance for this clustering which is appended to the Summary.
record_time – Wether to time clustering execution.
v – Be chatty.
purge – If True, force re-initialisation of cluster label assignments.

fit_hierarchical(self, radius_cutoff: Union[float, List[float]], cnn_cutoff: Union[int, List[int]], member_cutoff: int = None, max_clusters: int = None, cnn_offset: int = None)¶: Execute hierarchical clustering procedure

property fitter¶

classmethod get_builder_kwargs(cls)¶

get_child(self, label)¶

property hierarchy_level¶: The level of this clustering in the hierarchical tree of clusterings (0 for the root instance).

info(self)¶

property input_data¶

isolate(self, bool purge: bool = True, bool isolate_input_data: bool = True)¶

Create child clusterings from cluster labels

Parameters

purge – If True, creates a new mapping for the children of this clustering.
isolate_input_data – If True, attaches a subset of the input data of this clustering to the child.

property labels¶: Direct access to cnnclustering._types.Labels.labels holding cluster label assignments for points in input_data.

make_parameters(self, double radius_cutoff: float, cnn_cutoff: int, current_start: int) → Type[ClusterParameters]¶

pie(self, ax=None, pie_props=None)¶

predict(self, other: Type[u'Clustering'], double radius_cutoff: float, cnn_cutoff: int, clusters: Optional[Sequence[int]] = None, cnn_offset: Optional[int] = None, info: bool = True, record: bool = True, record_time: bool = True, v: bool = True, purge: bool = False)¶

Execute prediction procedure

Parameters

other – cnnclustering.cluster.Clustering instance for which cluster labels should be predicted.
radius_cutoff – Neighbour search radius.
cnn_cutoff – Similarity criterion.
cluster – Sequence of cluster labels that should be included in the prediction.
cnn_offset – Exists for compatibility reasons and is substracted from cnn_cutoff. If cnn_offset = 0, two points need to share at least cnn_cutoff neighbours to be part of the same cluster without counting any of the two points. In former versions of the clustering, self-counting was included and cnn_cutoff = 2 is equivalent to cnn_cutoff = 0 in this version.
purge – If True, force re-initialisation of predicted cluster labels.

reel(self, depth: Optional[int] = None) → None¶

Wrap up label assignments of lower hierarchy levels

Parameters

depth – How many lower levels to consider. If None,
all. (consider) –

summarize(self, ax=None, unicode quantity: str = u'execution_time', treat_nan: Optional[Any] = None, convert: Optional[Any] = None, ax_props: Optional[dict] = None, contour_props: Optional[dict] = None, unicode plot_style: str = u'contourf')¶

Generate a 2D plot of record values

Record values (“time”, “clusters”, “largest”, “noise”) are plotted against cluster parameters (radius cutoff r and cnn cutoff c).

Parameters

ax – Matplotlib Axes to plot on. If None, a new Figure with Axes will be created.
quantity –
Record value to visualise:
- ”time”
- ”clusters”
- ”largest”
- ”noise”
treat_nan – If not None, use this value to pad nan-values.
ax_props – Used to style ax.
contour_props – Passed on to contour.

property summary¶: Return an instance of cnnclustering.cluster.Summary collecting clustering results for this clustering.

to_nx_DiGraph(self, ignore=None)¶

Convert cluster hierarchy to networkx DiGraph

Keyword Arguments: ignore – A set of label not to include into the graph. Use for example to exclude noise (label 0).

tree(self, ax=None, ignore=None, pos_props=None, draw_props=None)¶

trim_shrinking_leafs(self)¶

trim_trivial_leafs(self)¶

Scan cluster hierarchy for removable nodes

If the cluster label assignments on a clustering are all zero (noise), the clustering is considered trivial. In this case, the labels and children are reset to None.

class cnnclustering.cluster.ClusteringBuilder(data, preparation_hook=None, registered_recipe_key=None, clustering_type=None, alias=None, parent=None, **recipe)¶

Orchestrate correct initialisation of a clustering

Parameters

data – Data that should be clustered in a format compatible with ‘input_data’ specified in the building recipe. May go through preparation_hook to establish compatibility.

Keyword Arguments

preparation_hook – A function that takes input data as a single argument and returns the (optionally) reformatted data plus additional information (e.g. “meta”) in form of an argument tuple and a keyword argument dictionary that can be used to initialise an input data type. If None uses _default_preparation_hook.
recipe – Building instructions for a clustering initialisation. Should be a mapping of component keyword arguments to componenet type details.

aggregate_components(self)¶

build(self)¶: Initialise clustering with data and components

Records and summary¶

class cnnclustering.cluster.Record(n_points=None, radius_cutoff=None, cnn_cutoff=None, member_cutoff=None, max_clusters=None, n_clusters=None, ratio_largest=None, ratio_noise=None, execution_time=None)¶

Cluster result container

cnnclustering.cluster.Record instances can created during cnnclustering.cluster.Clustering.fit() and are collected in cnnclustering.cluster.Summary.

to_dict(self)¶

class cnnclustering.cluster.Summary(iterable=None)¶

List like container for cluster results

Stores instances of cnnclustering.cluster.Record.

insert(self, index, item)¶

to_DataFrame(self)¶

Convert list of records to (typed) pandas.DataFrame

Returns: pandas.DataFrame

cnnclustering.cluster.make_typed_DataFrame(columns, dtypes, content=None)¶: Construct pandas.DataFrame with typed columns

cnnclustering.cluster.timed(function)¶: Decorator to measure execution time

cluster¶

Clustering¶

Records and summary¶

CommonNN Clustering

Navigation

Related Topics