Abstract
Background: Due to the large network scale, nowadays, it is hard to get extensive data from online social networks (OSN). Moreover, a large number of social nodes and links have made network data analysis a time-consuming task. Therefore, to sample the large-scale online social networks and restore the topological properties of original network become a problem. The purpose of this paper is to study an unbiased sampling method that can extract a representative sample from the social graph.
Methods: We propose an improved algorithm based on MHRW, called Unbiased Delay sampling (UD algorithm). Then we compare it with some recent patents on sampling method to evaluate our method.
Results: Different sample methods extract subnet with different topological properties. We find that UD can adapt to all kinds of different network connectivity. On the one hand, UD has a better degree distribution when the sample does not consider repeated nodes; on the other hand, UD algorithm can reduce the probability of reiterated nodes selected to sample and improve the ability of network discovery.
Conclusion: We get the first, to the best of our knowledge, unbiased sampling method which has a good degree of distribution when the sample set does not have duplicate nodes. More specifically, we add parameter α to sampling process, and the value of α can control the repetition rate of the sample set.
Keywords: Social network, MHRW, twitter, degree distribution, independent sample, unbiased sampling.
Graphical Abstract