364 Y. Wang et al. / Information Sciences 432 (2018) 362–375
u
1
u
2
u
3
u
4
v
1
v
2
v
3
v
4
Fig. 1. A bipartite graph from Example 1.
from the literature, including randomly generated classical instances [27] and a broad range of massive bipartite graphs
with nearly one billion edges. Experimental results demonstrate that PSRS significantly outperforms previous algorithms
and improves upon the best known solution quality for certain difficult instances.
In order to improve the performance of PSRS on massive bipartite graphs, we propose two additional heuristics. The
third heuristic is a two-mode perturbation (TMP) heuristic, which combines the greedy selection rule based on pscore with
a randomized selection strategy. PSRS typically chooses the pair of vertices for addition with the greatest pscore. However,
for massive bipartite graphs, it is very time consuming to find the pair of vertices with the greatest pscore, because there
are too many candidate vertex pairs. Additionally, most real-world massive bipartite graphs are very sparse, meaning pure
greedy heuristics can easily lead the search into local optima. Based on these two considerations, we improve the selection
heuristic by incorporating a randomized selection strategy with a certain probability, leading to the TMP heuristic. This
reduces the average cost of choosing the vertex pair for addition, and also introduces additional diversification.
The final heuristic is the k-bipartite core reduction rule (KCR), which is used to reduce the scale of massive bipartite
graphs by deleting vertices that are impossible to include in any optimal solutions. This rule is based on a heuristic called
the k-bipartite core, which is inspired by the heuristic of the k-core [18] .
We improve the PSRS algorithm by using the third and fourth heuristics, and the resulting algorithm is called PSRS+
(PSRS with TMP and KCR), which is more effective for massive graphs. We select 17 massive bipartite graphs from [9] and
use them to test the performances of EA/SM, PSRS, and PSRS+. Experiments demonstrate that PSRS+ greatly improves upon
PSRS and significantly outperforms EA/SM on all massive real graphs.
We also conduct experimental analysis and additional investigations on the heuristics presented in this work. Specifi-
cally, we compare PSRS and PSRS+ with several alternative versions that operate without using one of the aforementioned
heuristics. The experimental results demonstrate the effectiveness of the proposed heuristics.
In the next section, we introduce some necessary background knowledge and previous MBBP algorithms. We then pro-
pose a local search framework based on pair operations. Next, we describe our novel scoring function, self-adaptive restart-
ing heuristic, and PSRS algorithm. We then present the TMP heuristic and KCR reduction rule, and present the PSRS+ algo-
rithm. Sections 6 and 7 present the experimental results for PSRS and PSRS+, as well as experiments validating the effec-
tiveness of the proposed novel heuristics on some benchmarks. Finally, we provide some concluding remarks.
2. Background
2.1. Basic definitions and notations
Given a bipartite graph G = ( U, V, E ), G can be divided into two disjoint vertex sets U = { u
1
, u
2
, . . . , u
n
} and V = { v
1
,
v
2
, . . . , v
m
} such that every edge connects one vertex in U to one vertex in V. E = { e
1
, e
2
, . . . , e
t
} is the set of edges.
The neighborhood of a vertex u ∈ U is N ( u ) = { v ∈ V |( v, u ) ∈ E }. Similarly, the neighborhood of a vertex v ∈ V is N ( v ) = { u ∈ U |( v,
u ) ∈ E }. The degree of a vertex v is the size of its neighborhood and is denoted | N ( v )|. The size of a bipartite graph is defined
as max {| U |, | V |}. A balanced bipartite graph is a bipartite graph in which both disjoint sets of vertices are of the same
cardinality (i.e., | U| = | V | = n ).
Example 1. Fig. 1 presents a bipartite graph G = ( U, V, E ), where U = { u
1
, u
2
, u
3
, u
4
}, V = { v
1
, v
2
, v
3
, v
4
}, and E = {( u
1
, v
1
),
( u
1
, v
2
), ( u
2
, v
1
), ( u
2
, v
2
), ( u
2
, v
3
), ( u
2
, v
4
), ( u
3
, v
2
), ( u
3
, v
3
), ( u
3
, v
4
), ( u
4
, v
2
), ( u
4
, v
3
), ( u
4
, v
4
)}.
A biclique B = ( U
b
, V
b
, E
b
) of a given bipartite graph G is a subgraph of G where for each pair of vertices u ∈ U
b
and v ∈ V
b
,
( u, v ) ∈ E
b
. If | B | = | U
b
| = | V
b
| = k, then we say that B is a balanced biclique of size k . A balanced biclique is said to be
maximal if it cannot be extended to a larger balanced biclique. The MBBP problem is to find the balanced biclique with the
most vertices in a bipartite graph. In other words, k should be maximized. For example, in the graph depicted in Fig. 1 , the
vertices u
2
, u
3
, and u
4
, and v
2
, v
3
, and v
4
, as well as the nine edges between them, constitute a biclique of size three, which
happens to be the maximum balanced biclique in the bipartite graph.