Estimators#
Heuristic estimation supporting intermediate processes elsewhere.
- cosasi.utils.estimators.bits_encode_integer(n)#
Estimates the number of bits required to encode an integer n>=1.
- Parameters
n (int) – an integer at least 1
Notes
Calculation is from Section 4.1 of [1].
References
- 1
B. A. Prakash, J. Vreeken, C. Faloutsos, “Efficiently spotting the starting points of an epidemic in a large graph” Knowledge and Information Systems, 2013 https://link.springer.com/article/10.1007/s10115-013-0671-5
- cosasi.utils.estimators.bits_encode_ripple(s, G, beta=0.01)#
Total description length of a seed set and its corresponding maximum likelihood propagation ripple.
- Parameters
s (array-like) – seed set
G (NetworkX Graph) – The original graph the infection process was run on.
beta (float) – infection probability
Notes
Calculation is from Section 4.3 of [1].
References
- 1
B. A. Prakash, J. Vreeken, C. Faloutsos, “Efficiently spotting the starting points of an epidemic in a large graph” Knowledge and Information Systems, 2013 https://link.springer.com/article/10.1007/s10115-013-0671-5
- cosasi.utils.estimators.bits_encode_seed(s, G)#
Number of bits required to identify a seed set (hypothesized infection source set).
- Parameters
s (array-like) – seed set
G (NetworkX Graph) – The original graph the infection process was run on.
Notes
Calculation is from Section 4.1 of [1].
References
- 1
B. A. Prakash, J. Vreeken, C. Faloutsos, “Efficiently spotting the starting points of an epidemic in a large graph” Knowledge and Information Systems, 2013 https://link.springer.com/article/10.1007/s10115-013-0671-5
- cosasi.utils.estimators.chatter(I, G)#
Estimates the number of sources of a graph diffusion process via the Chatter algorithm.
- Parameters
I (NetworkX Graph) – The infection subgraph observed at a particular time step
G (NetworkX Graph) – The graph the diffusion process was originally run on
- cosasi.utils.estimators.chatter_distance(G, t, u=None, v=None, normalized=True)#
Invokes the Chatter Algorithm/chatter frequency to obtain chatter distance, a graph topology metric.
- Parameters
G (NetworkX Graph) – The graph to analyze
t (int) – number of rounds to complete
u (node (optional)) – starting node. if not provided, we return an array of distances
v (node (optional)) – end node. if not provided, we return an array of distances
normalized (bool) – if True, all distances are scaled to have a max value of 1
Notes
The chatter distance between nodes u and v reflects the difficulty node u is expected to have in transmitting a message to node v.
- cosasi.utils.estimators.chatter_frequency(G, t=None)#
Implements the Chatter Algorithm described in Notes.
- Parameters
G (NetworkX Graph) – The graph to analyze
t (int or None (optional)) – number of rounds to complete if None, the algorithm runs until every node’s message is received by every other node at least 5 times.
Notes
Each node starts with a message bank consisting of its own ID. For t many rounds, each node broadcasts its message bank to its neighbors, and all nodes receiving messages append them to their own message bank. message_frequency[i][j] is the number of times i received j’s message.
A “naive”/pure message-passing formulation of this would be along the lines of:
def chatter_distance_slow(G, t): messages = {i:[i] for i in G} for _ in range(t): new_messages = copy.deepcopy(messages) for i in range(len(G)): for j in G.neighbors(i): new_messages[j] += messages[i] messages = new_messages return messages
where messages[i].count(j) is the number of times i received j’s message. But this is very slow and easily re-written as matrix multiplication, as is done here.
- cosasi.utils.estimators.description_length(s, G, beta=0.01)#
Implements a greedy heuristic to estimate the two-part minimal infection description length of a proposed set of infection sources.
- Parameters
s (array-like) – seed set
G (NetworkX Graph) – The original graph the infection process was run on.
beta (float) – infection probability
Notes
The minimal description length, as applied to source localization, is introduced in [1].
References
- 1
B. A. Prakash, J. Vreeken, C. Faloutsos, “Efficiently spotting the starting points of an epidemic in a large graph” Knowledge and Information Systems, 2013 https://link.springer.com/article/10.1007/s10115-013-0671-5
- cosasi.utils.estimators.eigengap(G)#
Returns the estimated number of clusters of G, based on the Eigengap of the normalized graph Laplacian.
- Parameters
G (NetworkX Graph) – The graph to analyze
Notes
The Eigengap heuristic is described in [1].
References
- 1
U. von Luxburg, “A Tutorial on Spectral Clustering” Statistics and Computing, 2007 https://link.springer.com/article/10.1007/s11222-007-9033-z
- cosasi.utils.estimators.number_sources(I, number_sources=None, return_source_subgraphs=True, number_sources_method='eigengap', G=None)#
Manages source subgraph estimation, mostly via spectral analysis and clustering.
- Parameters
I (NetworkX Graph) – The infection subgraph observed at a particular time step
number_sources (int or None (optional)) – if int, this is the hypothesized number of infection sources if None, estimates the number of sources via Eigengap heuristic
return_source_subgraphs (bool) – if True, returns subgraphs of I corresponding to each hypothesized infection source if False, does not return subgraphs of I corresponding to each hypothesized infection source
number_sources_method (str) – method for estimating the number of sources. one of the following options: - “eigengap” : uses the Eigengap of the normalized graph Laplacian to estimate the number of clusters - “netsleuth” : runs the multi-source NETSLEUTH algorithm and reports the number of seeds - “chatter” : invokes a spectral method based on the Chatter algorithm if number_sources != None, this doesn’t do anything
G (NetworkX Graph (optional)) – the original network the contagion process was run on generally optional (e.g. not needed for eigengap), occassionally required (e.g. needed for netsleuth)
Notes
If the diffusion process is brief or observation is early, and infection sources are sufficiently sparse, then the infected subgraphs corresponding to each infection source may be the connected components of the input graph. This is described in Section 2.6 of [1].
We estimate the number of infection sources by the minimum of the number of connected components and the Eigengap heuristic of the provided graph. The Eigengap heuristic is described in [2].
With a hypothesized number of infection sources in hand, we partition the graph via spectral clustering to provide a list of subgraphs corresponding to each infection source [3].
References
- 1
L. Ying and K. Zhu, “Diffusion Source Localization in Large Networks” Synthesis Lectures on Communication Networks, 2018
- 2
U. von Luxburg, “A Tutorial on Spectral Clustering” Statistics and Computing, 2007 https://link.springer.com/article/10.1007/s11222-007-9033-z
- 3
A. Damle and V. Minden and L. Ying “Simple, direct and efficient multi-way spectral clustering” Information and Inference: A Journal of the IMA, 2019 https://academic.oup.com/imaiai/article/8/1/181/5045955
- cosasi.utils.estimators.source_subgraphs(I, number_sources=2)#
Subdivides the provided graph into specified number of subgraphs via spectral clustering.
- Parameters
I (NetworkX Graph) – The infection subgraph observed at a particular time step
number_sources (int) – The hypothesized number of infection sources