by , ,
In many graph mining problems, two networks from different domains have to be matched. In the absence of reliable node attributes, graph matching relies only on the link structure of the two networks, which amounts to a generalization of the classic graph isomorphism problem. Graph matching has applications in social network reconciliation and de-anonymization, protein network alignment in biology, and computer vision. The most scalable graph matching approaches use ideas from percolation graph theory, where a matched node pair infects neighbouring node pairs as additional potential matches. This class of matching algorithms requires an initial seed set of known matches to start the percolation. The size and correctness of the matching is very sensitive to the size of the seed set. In this paper, we achieve a dramatic reduction in the required size of the seed set, with only a small increase in matching errors. We define a new percolation algorithm that tolerates temporary matching errors. This algorithm can operate in a regime with far fewer starting seeds than previous approaches. We rigorously characterise, in a random graph model when the two networks overlap only partially, the phase transition in the seed set size using ideas from bootstrap percolation theory. We also show the excellent performance in matching several real social networks with over a million nodes, using only a handful of seeds.
Growing a Graph Matching from a Handful of Seeds E. Kazemi, S. H. Hassani, M. GrossglauserIn Very Large Data Bases (VLDB), 2015
Bibtex Entry:
	author = {Ehsan Kazemi and S. Hamed Hassani and Matthias Grossglauser},
	booktitle = {Very Large Data Bases (VLDB)},
	title = {Growing a Graph Matching from a Handful of Seeds},
	year = {2015}}