_types

This module provides a set of types that can be used as building block in the aggregation of a Clustering object.

Go to:

Cluster parameters

class cnnclustering._types.ClusterParameters(double radius_cutoff: float, similarity_cutoff: int = 0, double similarity_cutoff_continuous: float = 0., n_member_cutoff: int = None, current_start: int = 1)

Input parameters for clustering procedure

Parameters

radius_cutoff – Neighbour search radius \(r\).

Keyword Arguments
  • similarity_cutoff – Value used to check the similarity criterion. In common-nearest-neighbours clustering, it is the minimum required number of shared neighbours \(c\).

  • similarity_cutoff_continuous – Same as similarity_cutoff but allowed to be a floating point value.

  • n_member_cutoff – Minimum required number of points in neighbour lists to be considered (tested in cnnclustering._types.Neighbours.enough). If None, will be set to similarity_cutoff.

  • current_start – Use this as the first label for identified clusters.

Members

to_dict

Cluster labels

class cnnclustering._types.Labels(labels, consider=None, *, meta=None)

Represents cluster label assignments

Parameters

labels – A container of integer cluster labels supporting the buffer protocol

Keyword Arguments
  • consider – A boolean (uint8) container of same length as labels indicating if a cluster label should be considered for assignment during clustering. If None, will be created as all true.

  • meta – Meta information. If None, will be created as empty dictionary.

n_points

The length of the labels container

labels

The labels container converted to a NumPy ndarray

meta

The meta information dictionary

consider

The consider container converted to a NumPy ndarray

mapping

A mapping of cluster labels to indices in labels

set

The set of cluster labels

consider_set

A set of cluster labels to consider for cluster label assignments

Members

from_sequence, sort_by_size

Input data

Types used as input data to a clustering have to adhere to the input data interface which is defined through InputDataExtInterface for Cython extension types. For pure Python types the input data interface is defined through the abstract base class InputDataInputData and the specialised abstract classes


class cnnclustering._types.InputDataExtInterface

Defines the input data interface for Cython extension types

compute_distances(self, InputDataExtInterface input_data)
compute_neighbourhoods(self, InputDataExtInterface input_data, AVALUE r, ABOOL is_sorted, ABOOL is_selfcounting)
get_builder_kwargs(type cls)
get_component(self, point: int, dimension: int)int
get_distance(self, point_a: int, point_b: int)int
get_n_neighbours(self, point: int)int
get_neighbour(self, point: int, member: int)int
meta

dict

Type

meta

n_dim

‘AINDEX’

Type

n_dim

n_points

‘AINDEX’

Type

n_points

class cnnclustering._types.InputData

Defines the input data interface

abstract property data

Return underlying data (only for user convenience, not to be relied on)

classmethod get_builder_kwargs(cls)
abstract get_subset(self, indices: Container)Type[InputData]

Return input data subset

abstract property meta

Return meta-information

abstract property n_points

Return total number of points

class cnnclustering._types.InputDataComponents

Extends the input data interface

abstract get_component(self, point: int, dimension: int)float

Return one component of point coordinates

abstract property n_dim

Return total number of dimensions

abstract to_components_array(self)Type[np.ndarray]

Return input data as NumPy array of shape (#points, #components)

class cnnclustering._types.InputDataPairwiseDistances

Extends the input data interface

abstract get_distance(self, point_a: int, point_b: int)float

Return the pairwise distance between two points

class cnnclustering._types.InputDataPairwiseDistancesComputer

Extends the input data interface

abstract compute_distances(self, input_data: Type[InputData])None

Pre-compute pairwise distances

class cnnclustering._types.InputDataNeighbourhoods

Extends the input data interface

abstract get_n_neighbours(self, point: int)int

Return number of neighbours for point

abstract get_neighbour(self, point: int, member: int)int

Return a member for point

class cnnclustering._types.InputDataNeighbourhoodsComputer

Extends the input data interface

abstract compute_neighbourhoods(self, input_data: Type[u'InputData'], double r: float, is_sorted: bool = False, is_selfcounting: bool = True)None

Pre-compute neighbourhoods at radius

class cnnclustering._types.InputDataExtComponentsMemoryview

Implements the input data interface

Stores compenents as cython memoryview.

by_parts(self)Iterator

Yield data by parts

Returns

Generator of 2D numpy.ndarray s (parts)

get_component(self, point: int, dimension: int)int
get_subset(self, indices: Sequence)Type[InputDataExtComponentsMemoryview]
to_components_array(self)
class cnnclustering._types.InputDataExtDistancesLinearMemoryview

Implements the input data interface

Stores distances as 1D memoryview

class cnnclustering._types.InputDataExtNeighbourhoodsMemoryview

Implements the input data interface

Neighbours of points stored using a cython memoryview.

get_n_neighbours(self, point: int)int
get_neighbour(self, point: int, member: int)int
get_subset(self, indices: Sequence)Type[InputDataExtNeighbourhoodsMemoryview]

Return input data subset

class cnnclustering._types.InputDataNeighbourhoodsSequence(data: Sequence, *, meta=None)

Implements the input data interface

Neighbours of points stored as a sequence.

Parameters

data – Any sequence of neighbour index sequences (need to be sized, indexable, and iterable)

Keyword Arguments

meta – Meta-information dictionary.

property data
get_n_neighbours(self, point: int)int
get_neighbour(self, point: int, member: int)int
get_subset(self, indices: Container)Type[InputDataNeighbourhoodsSequence]
property meta
property n_neighbours
property n_points
class cnnclustering._types.InputDataSklearnKDTree(data: Type[numpy.ndarray], *, meta=None, **kwargs)

Implements the input data interface

Components stored as a NumPy array. Neighbour queries delegated to pre-build KDTree.

build_tree(self, **kwargs)
clear_cached(self)
compute_neighbourhoods(self, input_data: Type[u'InputData'], double radius: float, is_sorted: bool = False, is_selfcounting: bool = True)
property data
get_component(self, point: int, dimension: int)float
get_n_neighbours(self, point: int)int
get_neighbour(self, point: int, member: int)int

Return a member for point

get_subset(self, indices: Container)Type[InputDataSklearnKDTree]

Return input data subset

property meta
property n_dim
property n_neighbours
property n_points
to_components_array(self)

Neighbour containers

class cnnclustering._types.NeighboursExtInterface
assign(self, member: int)
contains(self, member: int)
enough(self, member_cutoff: int)
get_builder_kwargs(type cls)
get_member(self, index: int)
n_points

‘AINDEX’

Type

n_points

reset(self)
class cnnclustering._types.Neighbours

Defines the neighbours interface

abstract assign(self, member: int)None

Add a member to this container

abstract contains(self, member: int)bool

Return True if member is in neighbours container

abstract enough(self, member_cutoff: int)bool

Return True if there are enough points

classmethod get_builder_kwargs(cls)
abstract get_member(self, index: int)int

Return indexable neighbours container

abstract property n_points

Return total number of points

abstract property neighbours

Return point indices as NumPy array

abstract reset(self)None

Reset/empty this container

class cnnclustering._types.NeighboursExtVector

Implements the neighbours interface

Uses an underlying C++ std:vector.

Parameters

initial_size – Number of elements reserved for the size of vector.

Keyword Arguments

neighbours – A sequence of labels suitable to be cast to a vector.

class cnnclustering._types.NeighboursExtCPPSet

Implements the neighbours interface

Uses an underlying C++ std:set.

Keyword Arguments

neighbours – A sequence of labels suitable to be cast to a C++ set.

class cnnclustering._types.NeighboursExtCPPUnorderedSet

Implements the neighbours interface

Uses an underlying C++ std:unordered_set.

Keyword Arguments

neighbours – A sequence of labels suitable to be cast to a C++ set.

class cnnclustering._types.NeighboursExtVectorCPPUnorderedSet

Implements the neighbours interface

Uses a compination of an underlying C++ std:vector and a std:unordered_set.

Keyword Arguments

neighbours – A sequence of labels suitable to be cast to a C++ vector.

class cnnclustering._types.NeighboursList(neighbours=None)

Implements the neighbours interface

assign(self, member: int)
contains(self, member: int)bool
enough(self, member_cutoff: int)bool
get_member(self, index: int)int
property n_points
property neighbours
reset(self)
class cnnclustering._types.NeighboursSet(neighbours=None)

Implements the neighbours interface

assign(self, member: int)
contains(self, member: int)bool
enough(self, member_cutoff: int)
get_member(self, index: int)int
property n_points
property neighbours
reset(self)

Neighbours getter

class cnnclustering._types.NeighboursGetterExtInterface
get(self, AINDEX index, InputDataExtInterface input_data, NeighboursExtInterface neighbours, ClusterParameters cluster_params)
get_builder_kwargs(type cls)
get_other(self, AINDEX index, InputDataExtInterface input_data, InputDataExtInterface other_input_data, NeighboursExtInterface neighbours, ClusterParameters cluster_params)
is_selfcounting

‘bool’

Type

is_selfcounting

is_sorted

‘bool’

Type

is_sorted

class cnnclustering._types.NeighboursGetter

Defines the neighbours-getter interface

abstract get(self, index: int, input_data: Type[InputData], neighbours: Type[Neighbours], cluster_params: Type[ClusterParameters])None

Collect neighbours for point in input data

classmethod get_builder_kwargs(cls)
get_other(self, index: int, input_data: Type[InputData], other_input_data: Type[InputData], neighbours: Type[Neighbours], cluster_params: Type[ClusterParameters])None

Collect neighbours in input data for point in other input data

abstract property is_selfcounting

Return True if points count as their own neighbour

abstract property is_sorted

Return True if neighbour indices are sorted

class cnnclustering._types.NeighboursGetterExtBruteForce(distance_getter: Type[DistanceGetterExtInterface])

Implements the neighbours getter interface

This getter retrieves the neighbours of a point by comparing the distances (from a distance getter) between the point and all other points to the radius cutoff (\(r_{ij} \leq r\)).

The resulting neighbour containers are in general not sorted and include points as their own neighbour (self counting).

Parameters

distance_getter – An object implementing the distance getter interface. Has to be a Cython extension type.

get_builder_kwargs(type cls)
class cnnclustering._types.NeighboursGetterExtLookup

Implements the neighbours getter interface

class cnnclustering._types.NeighboursGetterBruteForce(distance_getter: Type[DistanceGetter])

Implements the neighbours getter interface

get(self, index: int, input_data: Type[InputData], neighbours: Type[Neighbours], cluster_params: Type[ClusterParameters])
classmethod get_builder_kwargs(cls)
get_other(self, index: int, input_data: Type[InputData], other_input_data: Type[InputData], neighbours: Type[Neighbours], cluster_params: Type[ClusterParameters])
property is_selfcounting
property is_sorted
class cnnclustering._types.NeighboursGetterLookup(is_sorted=False, is_selfcounting=False)

Implements the neighbours getter interface

get(self, index: int, input_data: Type[InputData], neighbours: Type[Neighbours], cluster_params: Type[ClusterParameters])None
get_other(self, index: int, input_data: Type[InputData], other_input_data: Type[InputData], neighbours: Type[Neighbours], cluster_params: Type[ClusterParameters])
property is_selfcounting
property is_sorted
class cnnclustering._types.NeighboursGetterRecomputeLookup(is_sorted=False, is_selfcounting=True)

Implements the neighbours getter interface

get(self, index: int, input_data: Type[InputData], neighbours: Type[Neighbours], cluster_params: Type[ClusterParameters])None
get_other(self, index: int, input_data: Type[InputData], other_input_data: Type[InputData], neighbours: Type[Neighbours], cluster_params: Type[ClusterParameters])
property is_selfcounting
property is_sorted

Distance getter

class cnnclustering._types.DistanceGetterExtInterface
get_builder_kwargs(type cls)
get_single(self, AINDEX point_a, AINDEX point_b, InputDataExtInterface input_data)
get_single_other(self, AINDEX point_a, AINDEX point_b, InputDataExtInterface input_data, InputDataExtInterface other_input_data)
class cnnclustering._types.DistanceGetter

Defines the distance getter interface

classmethod get_builder_kwargs(cls)
abstract get_single(self, point_a: int, point_b: int, input_data: Type[InputData])float

Get distance between two points in input data

abstract get_single_other(self, point_a: int, point_b: int, input_data: Type[InputData], other_input_data: Type[InputData])float

Get distance between two points in input data and other input data

class cnnclustering._types.DistanceGetterExtMetric

Implements the distance getter interface

get_builder_kwargs(type cls)
get_single(self, AINDEX point_a, AINDEX point_b, InputDataExtInterface input_data)
get_single_other(self, AINDEX point_a, AINDEX point_b, InputDataExtInterface input_data, InputDataExtInterface other_input_data)
class cnnclustering._types.DistanceGetterExtLookup

Implements the distance getter interface

class cnnclustering._types.DistanceGetterMetric(metric: Type[Metric])

Implements the distance getter interface

classmethod get_builder_kwargs(cls)
get_single(self, point_a: int, point_b: int, input_data: Type[InputData])
get_single_other(self, point_a: int, point_b: int, input_data: Type[InputData], other_input_data: Type[InputData])
class cnnclustering._types.DistanceGetterLookup

Implements the distance getter interface

get_single(self, AINDEX point_a, AINDEX point_b, InputDataExtInterface input_data)
get_single_other(self, AINDEX point_a, AINDEX point_b, InputDataExtInterface input_data, InputDataExtInterface other_input_data)

Metrics

class cnnclustering._types.MetricExtInterface

Defines the metric interface for extension types

adjust_radius(self, AVALUE radius_cutoff)float
calc_distance(self, AINDEX index_a, AINDEX index_b, InputDataExtInterface input_data)float
calc_distance_other(self, AINDEX index_a, AINDEX index_b, InputDataExtInterface input_data, InputDataExtInterface other_input_data)float
get_builder_kwargs(type cls)
class cnnclustering._types.Metric

Defines the metric-interface

abstract calc_distance(self, index_a: int, index_b: int, input_data: Type[InputData])float

Return distance between two points in input data

abstract calc_distance_other(self, index_a: int, index_b: int, input_data: Type[InputData], other_input_data: Type[InputData])float

Return distance between two points in input data and other input data

class cnnclustering._types.MetricExtDummy

Implements the metric interface

class cnnclustering._types.MetricExtPrecomputed

Implements the metric interface

class cnnclustering._types.MetricExtEuclidean

Implements the metric interface

class cnnclustering._types.MetricExtEuclideanReduced

Implements the metric interface

class cnnclustering._types.MetricExtEuclideanPeriodicReduced

Implements the metric interface

class cnnclustering._types.MetricDummy

Implements the metric interface

adjust_radius(self, double radius_cutoff: float)float
calc_distance(self, index_a: int, index_b: int, input_data: Type[InputData])float
calc_distance_other(self, index_a: int, index_b: int, input_data: Type[InputData], other_input_data: Type[InputData])float
class cnnclustering._types.MetricEuclidean

Implements the metric interface

adjust_radius(self, double radius_cutoff: float)float
calc_distance(self, index_a: int, index_b: int, input_data: Type[InputData])float
calc_distance_other(self, index_a: int, index_b: int, input_data: Type[InputData], other_input_data: Type[InputData])float
class cnnclustering._types.MetricEuclideanReduced

Implements the metric interface

adjust_radius(self, double radius_cutoff: float)float
calc_distance(self, index_a: int, index_b: int, input_data: Type[InputData])float
calc_distance_other(self, index_a: int, index_b: int, input_data: Type[InputData], other_input_data: Type[InputData])float

Similarity checker

class cnnclustering._types.SimilarityCheckerExtInterface

Defines the similarity checker interface for extension types

check(self, NeighboursExtInterface neighbours_a, NeighboursExtInterface neighbours_b, ClusterParameters cluster_params)
get_builder_kwargs(type cls)
class cnnclustering._types.SimilarityChecker

Defines the similarity checker interface

abstract check(self, neighbours_a: Type[Neighbours], neighbours_b: Type[Neighbours], cluster_params: Type[ClusterParameters])bool

Retrun True if a and b have sufficiently many common neighbours

classmethod get_builder_kwargs(cls)
class cnnclustering._types.SimilarityCheckerExtContains

Implements the similarity checker interface

Strategy:

Loops over members of one neighbours container and checks if they are contained in the other neighbours container. Breaks early when similarity criterion is reached. The performance and time-complexity of the check depends on the used neighbour containers. Worst case time complexity is \(\mathcal{O}(n * m)\) with \(n\) and \(m\) being the lengths of the neighbours containers if the containment check is performed by iteration. Worst case time complexity is \(\mathcal{O}(n)\) if containment check can be performed as lookup in linear time. Note that no switching of the neighbours containers is done to ensure that the first container is the one with the shorter length (compare cnnclustering._types.SimilarityCheckerExtSwitchContains).

class cnnclustering._types.SimilarityCheckerExtSwitchContains

Implements the similarity checker interface

Strategy:

Loops over members of one neighbours container and checks if they are contained in the other neighbours container. Breaks early when similarity criterion is reached. The performance and time-complexity of the check depends on the used neighbour containers. Worst case time complexity is \(\mathcal{O}(n * m)\) with \(n\) and \(m\) being the lengths of the neighbours containers if the containment check is performed by iteration. Worst case time complexity is \(\mathcal{O}(n)\) if containment check can be performed as lookup in linear time. Note that switching of the neighbours containers is done to ensure that the first container is the one with the shorter length (compare SimilarityCheckerExtContains).

class cnnclustering._types.SimilarityCheckerExtScreensorted

Implements the similarity checker interface

Strategy:

Loops over members of two neighbour containers alternatingly and checks if neighbours are contained in both containers. Requires that the containers are sorted ascendingly to return the correct result. Sorting will neither be checked nor enforced. Breaks early when similarity criterion is reached. The performance of the check depends on the used neighbour containers. Worst case time complexity is \(\mathcal{O}(n + m)\) with \(n\) and \(m\) being the lengths of the neighbours containers.

class cnnclustering._types.SimilarityCheckerContains

Implements the similarity checker interface

Strategy:

Loops over members of one neighbours container and checks if they are contained in the other neighbours container. Breaks early when similarity criterion is reached. The performance and time-complexity of the check depends on the used neighbour containers. Worst case time complexity is \(\mathcal{O}(n * m)\) with \(n\) and \(m\) being the lengths of the neighbours containers if the containment check is performed by iteration. Worst case time complexity is \(\mathcal{O}(n)\) if containment check can be performed as lookup in linear time. Note that no switching of the neighbours containers is done to ensure that the first container is the one with the shorter length (compare cnnclustering._types.SimilarityCheckerSwitchContains).

check(self, neighbours_a: Type[Neighbours], neighbours_b: Type[Neighbours], cluster_params: Type[ClusterParameters])bool
class cnnclustering._types.SimilarityCheckerSwitchContains

Implements the similarity checker interface

Strategy:

Loops over members of one neighbours container and checks if they are contained in the other neighbours container. Breaks early when similarity criterion is reached. The performance and time-complexity of the check depends on the used neighbour containers. Worst case time complexity is \(\mathcal{O}(n * m)\) with \(n\) and \(m\) being the lengths of the neighbours containers if the containment check is performed by iteration. Worst case time complexity is \(\mathcal{O}(n)\) if containment check can be performed as lookup in linear time. Note that a switching of the neighbours containers is done to ensure that the first container is the one with the shorter length (compare cnnclustering._types.SimilarityCheckerContains).

check(self, neighbours_a: Type[Neighbours], neighbours_b: Type[Neighbours], cluster_params: Type[ClusterParameters])bool

Queues

Queues can be optionally used by a fitter, e.g.

  • FitterExtBFS

  • FitterBFS


class cnnclustering._types.QueueExtInterface
get_builder_kwargs(type cls)
is_empty(self)bool
pop(self)int
push(self, value: int)

class cnnclustering._types.Queue

Defines the queue interface

classmethod get_builder_kwargs(cls)
abstract is_empty(self)bool

Return True if there are no values in the queue

abstract pop(self)

Retrieve value from the queue

abstract push(self, value)

Put value into the queue


class cnnclustering._types.QueueExtLIFOVector

Implements the queue interface


class cnnclustering._types.QueueExtFIFOQueue

Implements the queue interface


class cnnclustering._types.QueueFIFODeque

Implements the queue interface

is_empty(self)bool

Return True if there are no values in the queue

pop(self)

Retrieve value from front/left end

push(self, value)

Append value to back/right end