Abstract
Dominance structure of animal societies is likely to be more complex than simple linearity, but few, if any, methods exist for quantifying nonlinear structure. Our R package “Perc” presented a new network-based method, called Percolation and Conductance, which permits nonlinear structure in dominance interactions. It uses information from both direct and indirect dominance pathways to calculate consistency in the direction of transitive dominance pathways from A to B (e.g. via pathway through C, D, and/or E). Greater consistency results in higher certainty that A outranks B. It creates a matrix of probabilities that the row individual outranks the column individual. It performs simulated annealing processes to explore possible rank orders and finds the best possible rank order. By applying simple R functions, researchers will be able to quantify rank order and dominance uncertainty with the help of the package.
Rank relationships amongst members of a social group are most commonly represented as an ordered list of individuals that are ranked from highest to lowest. Numerous methods exist to identify this rank order, and most methods fall under one of two approaches: (1) finding the appropriate ranking by reordering the rows and columns of a win/loss matrix or (2) calculating a cardinal dominance index which ranks individuals by the proportion of others dominated, both of which are based upon a data set of agonistic and/or submissive interactions. Such methods produce a linear rank order, whether or not the society is expected to have a linear hierarchy or whether the data fit the assumption of linearity (which is particularly problematic when calculating cardinal ranks (Shev et al 2012)). Most researchers agree that the dominance structure of animal societies is likely to be more complex than simple linearity, but few, if any, methods exist for quantifying nonlinear structure. We present a new network-based method, called Percolation and Conductance, that permits nonlinear structure to emerge via estimates of network-wide directional consistency in the flow of dominance interactions and detection of blocks of dominance ambiguity that are indicative of nonlinear segments of a hierarchy. An additional benefit of this method is the ability to quantify dyadic-level dominance certainty. Dyadic dominance potentials are calculated using multiple indirect dominance pathways in the network (via common third parties) to infer missing data in the win/loss matrix and enhance the calculation of dominance potentials from direct interactions. Greater consistency in the direction of transitive dominance pathways from A to B (e.g. via pathway through C, D, and/or E) result in higher certainty that A outranks B, whereas greater evidence of inconsistent direction (e.g. some directed pathways go from A to B whereas other directed pathways go from B to A) result in dominance ambiguity. Because the number of indirect pathways is typically exponentially larger than direct win/loss interactions, one can be relatively confident that dyads with ambiguous dominance potentials are truly ambiguous due to evidence of directional inconsistency in their network pathways rather than lack of data.
Rank order using the percolation-conductance method is based on a matrix that combines information from direct win/loss interactions with information from indirect pathways as described above to create a matrix of probabilities that the row individual outranks the column individual. Percolation-conductance finds these multi-step, directed pathways between all pairs of nodes by performing a series of random walks through the network. A simulated annealing process explores possible rank orders (from the space of all possible rank orders) – the more simulated annealing runs that are performed, a larger number of possible rank orders are explored, meaning that the best possible rank order is more likely to be found. The starting location of simulated annealing is determined by the total wins per individual in the matrix. Multiple simulated annealing steps are required to arrive at the optimal rank order, keeping in mind that multiple rank orders may be equally good if there are nonlinear segments in the hierarchy. The heat map function (plot.conf.mat) will plot the ranking and can allow for the detection of non-linear dominance structures. A block of individuals with probabilities near 0.5 indicates a subgroup of individuals whose relationships are not clearly defined. A second metric yielded by this method, dominance uncertainty, reflects the directional consistency of information flow through the direct and indirect pathways in a directed and weighted network.
The Percolation-Conductance method of quantifying rank and dominance uncertainty is computation complicated and requires researchers to have a sophisticated computational background to process the data. Our group, in collaboration with collaboraters in UC Davis Statistics Deparment, developed an R package that implements the percolation-conductance algorithm on directed dyadic interactions. By applying simple R functions, researchers will be able to quantify rank order and dominance uncertainty with the help of the package. The goal of this tutorial is to show how to quantify rank and dominance uncertainty from win-loss raw data.
The function as.conflictmat
is used to import raw
edgelists or matrices for further analysis. Raw edgelists or matrices
are records for directed dyadic win-loss interactions. The prefered R
data types for as.conflictmat
are “matrix
” and
“data.frame
”. Raw edgelists or matrices can be imported as
R data type of either data.frame
or matrix
to
be used in as.conflictmat
.
The most simple and commonly used raw data is a edgelist of two columns consisting all dyadic win-loss interactions, with the first columns being the winner and the second column being the loser. Duplicates are allowed in this two-column edgelist because multiple interactions between a pair are represented by multiple rows of the same winner and loser. Below is an example of a two-column edgelist.
## Iname Rname
## 1 Kale Kibitz
## 2 Kale Kibitz
## 3 Kale Kibitz
## 4 Kale Kibitz
## 5 Kale Kolyma
A weighted edgelist is also allowed as raw edgelist
data. It is a “matrix
” or “data.frame
” of
three columns, with the first two columns being winner IDs and loser IDs
respectively and the third column being the frequency of the win-loss
interaction between the winner and the loser. An example of a weighted
edgelsit is shown below:
## Initiator1 Recipient1 Freq
## 1 Angelina Alexis 1
## 2 Angelina Darcy 1
## 3 Angelina FirstTimeMom 1
## 4 Angelina Mak 1
## 5 Angelina Tabitha 3
In addition, a matrix representing dyadic win-loss interaction is also accepted as raw data. For example,
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 0 0 0 0
## Angelina 1 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 6 0 0 0 0
## C19 0 0 0 0 0
The function as.conflictmat
is used to import raw data
of any accepted format, transform raw data to a
“conflict matrix
” for further analysis.
“conflict matrix
” is very specifically refered to a class
in R. If your raw data is an edgelist of either two- or three- column,
it will transform the your raw edgelist to a matrix in which counts
representing for how many times a row wins over a column. Running this
on a win-loss matrix creates an identical conflict matrix. This is
required for all data formats to create R object of class conflict
matrix.
Below is an example on transforming a two-column edgelist:
# convert an two-column edgelist to conflict matrix
confmatrix <- as.conflictmat(sampleEdgelist)
# displaying the first 5 rows and columns of the converted matrix
confmatrix[1:5, 1:5]
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0 0 0 0 0
## Kale 0 0 0 0 4
## Kalleigh 0 1 0 0 0
## Keira 0 0 0 0 0
## Kibitz 0 6 0 0 0
Below is an example on transforming a raw win-loss matrix:
# convert a win-loss matrix to conflict matrix
confmatrix2 <- as.conflictmat(sampleRawMatrix)
# displaying the first 5 rows and columns of the converted matrix
confmatrix2[1:5, 1:5]
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 0 0 0 0
## Angelina 1 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 6 0 0 0 0
## C19 0 0 0 0 0
When importing weighted edgelist (three-column), you’ll need to
specify weighted = TRUE
if your raw data is a weighted
edgelist. Below is an example:
Directionality is very importnat when using the
function as.conflictmat
to import raw data. By default,
winners (sources) are in the first column and loses (targets) are in the
second column for edgelists. Row names are assumed to be the winner for
a raw win-loss matrix. Values in a raw win-loss matrix describe that for
how many times the corresponding row ID win over the corresponding
column ID. If the directionality in your raw data happens to be opposite
from the default, you could specify swap.order = TRUE
in
as.conflictmat
. It will tell R that the directionality in
your raw data is the opposite of default setting. Running
as.conflictmat
on your raw data with
swap.order = TRUE
will automatically correct the
directionality when transforming your raw data to a
conflict matrix
. Here is an example,
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 0 0 0 0
## Angelina 1 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 6 0 0 0 0
## C19 0 0 0 0 0
# use swap.order = TRUE when running as.conflictmat.
confmatrix4 <- as.conflictmat(sampleRawMatrix, swap.order = TRUE)
# displaying the first 5 rows and columns of the converted matrix
confmatrix4[1:5, 1:5]
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 1 0 6 0
## Angelina 0 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 0 0 0 0 0
## C19 0 0 0 0 0
Before computing the dominance probability and finding the best rank order, you want to know whether or not long pathways are present in your data, and if present, whether or not these long pathways can be useful for gathering additional information about rank relationships.
You could find pathways of particular length that starting at a particular node (ID). For example,
# Testing findIDpaths.R
pathKuai <- findIDpaths(confmatrix, "Kuai", len = 3)
# Displaying the first 5 rows of pathKuai
pathKuai[1:5, ]
## [,1] [,2] [,3] [,4]
## [1,] "Kuai" "Kalleigh" "Kale" "Kibitz"
## [2,] "Kuai" "Kalleigh" "Kale" "Kolyma"
## [3,] "Kuai" "Kalleigh" "Kimora" "Kibitz"
## [4,] "Kuai" "Kalleigh" "Kimora" "Kioga"
## [5,] "Kuai" "Kalleigh" "Kimora" "Koppy"
# When there's no pathway starting at a particular individual, you'll get an output like the example below:
pathKalani <- findIDpaths(confmatrix, ID = "Kalani", len = 2)
## no pathways of length 2 found starting at Kalani
You could also use transitivity
to find out information
on transitivity, a measure of how consistent or inconsistent the flow of
information is through the network as a whole. This procedure looks for
the number of triangles (e.g. A→B→C→A) and determines whether or not
they are transitive.
## [1] 14
## [1] 4
## [1] 0.7777778
## [1] 6.468627
The function transitivity
allow estimation for
alpha
, which is a value used in the conductance procedure
to weight the information from the indirect pathways. The higher the
transitivity of a network (the higher the alpha), the greater the weight
the conductance procedure will give the indirect pathways.
After transforming your raw data in an R object of
conflict matrix
class, we will use the function
conductance
to find all indirect pathways of a particular
length (defined by maxLength
) and update the conflict
matrix. The maximum length of indirect pathways should be decided based
on the data. We use a max length of 4 for our dominance rank. This
yields the dominance probability matrix which represents the probability
the row outranks the column. Values in the matrix are from 0 - 1, with
0.5 being the most uncertain. If a value is less than 0.5, the
corresponding row ID is more likely to lose against the corresponding
column ID; if a value is greater than 0.5, the corresponding row ID is
more likely to win over the corresponding column ID.
DominanceProbability
is a list of 2 elements. The first
element, named imputed.conf
is the updated conflict matrix
which incorporates information from indirect pathways from the original
conflict matrix. Comparing the original conflict matrix with the updated
conflict matrix will show the information that is gained through the
indirect pathways:
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0 0 0 0 0
## Kale 0 0 0 0 4
## Kalleigh 0 1 0 0 0
## Keira 0 0 0 0 0
## Kibitz 0 6 0 0 0
# displaying the first 5 rows and columns of the imputed conflict matrix
DominanceProbability$imputed.conf[1:5, 1:5]
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.00000000 0.000000 0 0.00000000 0.0000000
## Kale 0.00000000 0.000000 0 0.00000000 4.0557534
## Kalleigh 0.05575338 1.000000 0 0.11150676 0.2230135
## Keira 0.00000000 0.000000 0 0.00000000 0.0000000
## Kibitz 0.00000000 6.055753 0 0.05575338 0.0000000
Examining information gained through indirect pathways will provide
you information to decide what is the appropriate maxLength
for your data. You could examine the information gained through indirect
pathways by substracting the original conflict matrix from imputed
conflict matrix. The information gained could visually examined by
generating a corresponding heatmap using the function
plot.conf.mat
.
# substracting the original conflict matrix from imputed conflict matrix.
informationGain <- DominanceProbability$imputed.conf - confmatrix
# examine the first five rows and columns of the informationGain
informationGain[1:5, 1:5]
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.00000000 0.00000000 0 0.00000000 0.00000000
## Kale 0.00000000 0.00000000 0 0.00000000 0.05575338
## Kalleigh 0.05575338 0.00000000 0 0.11150676 0.22301352
## Keira 0.00000000 0.00000000 0 0.00000000 0.00000000
## Kibitz 0.00000000 0.05575338 0 0.05575338 0.00000000
# generating a heatmap representing information gained by using informatio from indirect pathways.
plotConfmat(informationGain, ordering = NA, labels = TRUE)
As we could see from the heatmap, using maxLength = 2
does provide us more information than the original raw conflict matrix
that we imported from the raw data. In real analysis, you may want to
consider using maxLength = 3
or even more and repeat this
diagnosis process to examine the information gain. This will allow you
to have a better idea on what is the appropriate maxLength
for your data.
For example, if you want to examine whether or not
maxLength = 3
provide more information than
maxLength = 2
. Here is an example:
DominanceProbabilityLength3 <- conductance(confmatrix, maxLength = 3)
informationGain2 <- DominanceProbabilityLength3$imputed.conf - DominanceProbability$imputed.conf
plotConfmat(informationGain2, ordering = NA, labels = TRUE)
As we can see in this heatmap that using maxLength = 3
still gives you some more information. In real analysis, you may want to
repeat the analysis and try maxLength = 4
. In addition, you
may also want to consider your alpha value to decide how much you want
to trust those long pathways.
The second element of DominanceProbability
the dominance
probability matrix, named p.hat
. Here’s how you could pull
out the dominance probability matrix:
# displaying the first 5 rows and columns of the dominance probability matrix
DominanceProbability$p.hat[1:5, 1:5]
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.0000000 0.5000000 0.4236125 0.5000000 0.5000000
## Kale 0.5000000 0.0000000 0.1180829 0.5000000 0.4040371
## Kalleigh 0.5763875 0.8819171 0.0000000 0.6325280 0.7095211
## Keira 0.5000000 0.5000000 0.3674720 0.0000000 0.4236125
## Kibitz 0.5000000 0.5959629 0.2904789 0.5763875 0.0000000
When you want to use dominance probability matrix for further
analysis, often a long-format representation of the matrix is easier.
The function dyadicLongConverter
convert win-loss
uncertainty to long format.
# displaying the first 5 rows of the converted long format win-loss probability.
dyadicLongConverter(DominanceProbability$p.hat)[1:5, ]
## ID1 ID2 ID1 Win Probability ID2 Win Probability RankingCertainty
## 12 Kalani Kale 0.5000000 0.5000000 0.5000000
## 23 Kalani Kalleigh 0.4236125 0.5763875 0.5763875
## 24 Kale Kalleigh 0.1180829 0.8819171 0.8819171
## 34 Kalani Keira 0.5000000 0.5000000 0.5000000
## 35 Kale Keira 0.5000000 0.5000000 0.5000000
Simulated rank order is computed using a simulated annealing
algorithm by running the function simRankOrder
on
DominanceProbability$p.hat
. The argument num
is the number of simulated annealing runs to generate best rank order.
The argument kmax
tells you how long (perhaps iterations)
the simulated annealing function looks around for a local optimum to
find the best rank order. By default, kmax = 1000
was used.
But it takes a long time to run when using kmax = 1000
. The
example below uses kmax = 10 just because it runs quickly and it is
enough to illustrate the process, however this value is not recommended
when processing real data. Here is how to use simRankOrder
to find the simulated rank order:
The function s.rank
returns a list containing simulated
rank order and Costs for each simulated annealing run. The best rank
order will be the one with the lowest cost.
## ID ranking
## 1 Koppy 1
## 2 Kioga 2
## 3 Kuai 3
## 4 Kalleigh 4
## 5 Kyushu 5
## simAnnealRun Cost
## 1 1 2.321503
## 2 2 2.392065
## 3 3 2.321503
## 4 4 2.321503
## 5 5 2.392065
## 6 6 2.321503
## 7 7 2.392065
## 8 8 2.321503
## 9 9 2.392065
## 10 10 2.392065
## SimAnneal1 SimAnneal2 SimAnneal3 SimAnneal4 SimAnneal5 SimAnneal6 SimAnneal7
## 1 Koppy Koppy Koppy Koppy Koppy Koppy Koppy
## 2 Kioga Kioga Kioga Kioga Kioga Kioga Kioga
## 3 Kuai Kalleigh Kuai Kuai Kalleigh Kuai Kalleigh
## 4 Kalleigh Kyushu Kalleigh Kalleigh Kyushu Kalleigh Kyushu
## 5 Kyushu Kimora Kyushu Kyushu Kimora Kyushu Kimora
## 6 Kimora Kuai Kimora Kimora Kuai Kimora Kuai
## 7 Kibitz Kalani Kibitz Kalani Kibitz Kibitz Kibitz
## 8 Kolyma Kibitz Kolyma Kibitz Keira Kalani Keira
## 9 Keira Kolyma Keira Keira Kalani Kolyma Kolyma
## 10 Kale Kale Kalani Kolyma Kolyma Kale Kale
## 11 Kalani Keira Kale Kale Kale Keira Kalani
## SimAnneal8 SimAnneal9 SimAnneal10
## 1 Koppy Koppy Koppy
## 2 Kioga Kioga Kioga
## 3 Kuai Kalleigh Kalleigh
## 4 Kalleigh Kyushu Kyushu
## 5 Kyushu Kimora Kimora
## 6 Kimora Kuai Kuai
## 7 Kibitz Kibitz Kalani
## 8 Kolyma Keira Kibitz
## 9 Keira Kolyma Kolyma
## 10 Kale Kale Kale
## 11 Kalani Kalani Keira
After finding the simulated rank order, you can apply the rank order to your dominance probability matrix in order to generate a heatmap with its rows and columns ordered by rank. This heatmap will highlight areas of non-linear dominance rank (i.e. areas that are brown in the example below). A block of individuals with probabilities near 0.5 indicates a subgroup of individuals whose relationships are not clearly defined.
Besides ranking information, the undirected certainty information in
dominance relationship is also important. The values in the dominance
probability matrix range from 0-1 indicating both the direction of the
relationship and the certainty of the relationship. The function
valueConverter
converts all values to values between 0.5 -
1.0, which, in essence, removes directionality from the relationship
leaving a single metric of dominance certainty (0.5 indicates total
uncertainty and 1.0 indicates total certainty). For example,
# displaying the first 5 rows and columns of the converted matrix
valueConverter(DominanceProbability$p.hat)[1:5, 1:5]
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 1.0000000 0.5000000 0.5763875 0.5000000 0.5000000
## Kale 0.5000000 1.0000000 0.8819171 0.5000000 0.5959629
## Kalleigh 0.5763875 0.8819171 1.0000000 0.6325280 0.7095211
## Keira 0.5000000 0.5000000 0.6325280 1.0000000 0.5763875
## Kibitz 0.5000000 0.5959629 0.7095211 0.5763875 1.0000000
When you want to understand dyadic level dominance uncertainty, you
could get the information from dyadicLongConverter
. For
example,
# displaying the first 5 rows of Ranking Certainty
dyadicLongConverter(DominanceProbability$p.hat)[1:5, c("ID1", "ID2", "RankingCertainty")]
## ID1 ID2 RankingCertainty
## 12 Kalani Kale 0.5000000
## 23 Kalani Kalleigh 0.5763875
## 24 Kale Kalleigh 0.8819171
## 34 Kalani Keira 0.5000000
## 35 Kale Keira 0.5000000
When individual-level win-loss relationship uncertainty is the focus
of your question, the function individualDomProb
computes
the mean and standard deviation for individual-level win-loss
probability.
## ID Mean SD
## 1 Kalani 0.6079515 0.1705907
## 2 Kale 0.6403923 0.1628646
## 3 Kalleigh 0.7350389 0.1343705
## 4 Keira 0.6606999 0.1782413
## 5 Kibitz 0.6974806 0.1677272
## 6 Kimora 0.7269768 0.1736934
## 7 Kioga 0.7037738 0.1644045
## 8 Kolyma 0.6072364 0.1559790
## 9 Koppy 0.7157361 0.1433639
## 10 Kuai 0.6873502 0.1703318