This tutorial provides the data and the material for the workshop at the European Symposium on Suicide and Suicidal behavior 2018 in Gent, Belgium https://esssb17.org/.
In this tutorial, I will apply networkanalysis to the Beck Scale for suicide ideation, as done in our BJP article.
It contains updated analysis of our paper in BJP open. The updates are based on the recent blog of Sacha Epskamp. The data is available for download here.
The most relevant packages are qgraph, bootnet and mgm. Some code is based on pre-print material, such as a recent paper on 3-way interactions within the MGM package
As these packages and papers will likely be updated in the future, it might be that the code does not work anymore after a specific amount of time.
Be sure to check out the site of Eiko Fried, where most new innovations in the field of networkanalysis are presented in a comprehensible way!
To be clear, the goal of this tutorial is to help researchers in the field of suicide prevention get started with networkanalysis by applying some of the recent network techniques to data with information on suicidality. I did not develop any of the techniques, nor do I claim to fully understand all nitty gritty details. I am mainly enthousiastic about the potential of networkanalysis in the field of suicide prevention.
If you are serious about applying networkanalysis to your data, I would advice to contact the group of prof Borsboom.
For the workshop itself, it is convenient if you have installed the latest versions of the packages below before the workshop begins via install.packages(“nameofthepackage”).
When installed, you can load the package via:
library(mgm) ## package for Mixed Graphical modeling
library(devtools) ## package to load latest update of bootnet via github
install_github("sachaepskamp/bootnet")
library("bootnet") ## main package for estimating of network
library("qgraph") ## package for vizalisation of network
library("dplyr") ## package for data manipulation
library("haven") ## package to load SPSS data
library("summarytools") ## package for summarizing data
library("ggplot2") ## package foo
When we discuss networks within the realm of psychopathology, we are discussing the visualization and interpretation of an x by x association matrix. I deliberately use the word association, and not correlation, because in order to come to a sparse and interpretable network, we need to use penalization techniques on a correlation matrix (Epskamp & Fried 2016).Let’s first consider a simple, an empty 2 by 2 matrix:
one <- matrix ( c(0,0,
0,0), nrow=2,
ncol=2)
colnames(one) <- c("Node","Node")
qgraph(one, vsize = 15)
This code results in two nodes, that are not connected. We can connect the nodes by changing the matrix:
two <- matrix ( c(0,1,
1,0), nrow=2,
ncol=2)
colnames(two) <- c("Node","Node")
qgraph(two, vsize = 15)
The package qgraph allows to visualize the strength of the connection. If we change the connection between node 1 and node 2 by 3 (instead of the neutral 1), we find that the edge that connects the nodes becomes both green and thicker.
three <- matrix ( c(0,3,
3,0), nrow=2,
ncol=2)
colnames(three) <- c("Node","Node")
qgraph(three, layout = "circle", vsize = 15)
We can also visualize a negative relationship:
four <- matrix ( c(0,-3,
-3,0), nrow=2,
ncol=2)
colnames(four) <- c("Node","Node")
qgraph(four, layout = "circle", vsize = 15)
Nodes and edges can represent anything, for example, the relationship between depression (D) and rumination (R) or the number of published articles of the first author (D) and the third author (R) of this article. We can change the naming of the nodes of the previous matrix three using the following line:
five <- matrix ( c(0,3,
3,0), nrow=2,
ncol=2)
colnames(five) <- c("D","R")
qgraph(five, layout = "circle", vsize = 15)
A graph can be expanded by adding another row and column to the matrix. In this matrix, node 1 (named D) has a relationship of 4 with node 2 (named R) and of three with the third node, which we will call C. C and R are not directly related, but connected via D.
six <- matrix ( c(0,4,3,
4,0,0,
3,0,0), nrow=3,
ncol=3)
colnames(six) <- c("D","R", "C")
qgraph(six, layout = "circle", vsize = 15)
Node D is the most important node, as D is the only node that is linked to both R and C. R has a stronger link to D in comparison to C, so it is the second most central node. C automatically becomes the weakest link in the network. In qgraph, the so-called Furchterman Reingold alghoritm allows us to visualize this relative dependency(Fruchterman & Reingold 1991). It places the most central nodes in the middle of the graph, and least central nodes to the periphery. You can view the graph using the algorithm by changing the layout into “spring”.
Importantly, the centrality or importance of a node in a network can be quantified. In qgraph, three kinds of different, but highly related measures of centrality are implemented: betweenness, closeness and strength. More information about the separate definitions can be found in (Fried et al. 2016)
centralityPlot(six)
Now that we demonstrated the basics of matrix visualizing and centrality estimation, we can start estimating networks on actual data. We will use data from a sample of 367 patients treated for a suicide attempt in a Scottish hospital(O’Connor et al. 2015). Within 24 hours, the patients answered the Beck Scale for Suicide Ideation (BSS: Beck et al., 1979). For a more detailed description of the data, the design and the outcomes see (de Beurs et al. 2017; O’Connor et al. 2015).
The data is available for download here.
rm(list = ls())
It is important to set the working directory. This will be the location where you can store data and the code. You can use the code below by changing the path with your own path. You can also you the dropdown menu: session: set working directory.
setwd("C:/Users/user123/Dropbox/SUPER/Tutorial")
The downloaded text file that contains the data can be loaded into Rstudio as follows:
data <- read.table("Tutorial.txt" )
You will find that you often want to rename the variable names, as these will later on be used in the visualisation of the graph. In the code below, you change the names of the columns 3 to 22.
names(data) <-c("liv", "die", "rea", "des", "pas", "dur",
"fre", "att", "con", "det",
"cry", "pla", "met", "cou", "exp",
"pre", "not", "arr", "cea", "rep")
It is advised to always inspect the distribution of your data.
descr(data)
## Descriptive Statistics
## Data Frame: data
## N: 367
##
## liv die rea des pas dur fre att con
## ----------------- -------- -------- -------- -------- -------- -------- -------- -------- --------
## Mean 1.16 1.37 1.20 1.31 1.01 0.99 0.89 1.15 0.86
## Std.Dev 0.71 0.74 0.79 0.75 0.80 0.77 0.72 0.76 0.68
## Min 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## Q1 1.00 1.00 1.00 1.00 0.00 0.00 0.00 1.00 0.00
## Median 1.00 2.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
## Q3 2.00 2.00 2.00 2.00 2.00 2.00 1.00 2.00 1.00
## Max 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
## MAD 1.48 0.00 1.48 1.48 1.48 1.48 1.48 1.48 0.00
## IQR 1.00 1.00 1.00 1.00 2.00 2.00 1.00 1.00 1.00
## CV 0.61 0.54 0.66 0.57 0.79 0.78 0.81 0.66 0.79
## Skewness -0.24 -0.70 -0.36 -0.58 -0.02 0.02 0.16 -0.26 0.17
## SE.Skewness 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.13
## Kurtosis -1.01 -0.86 -1.33 -1.03 -1.44 -1.33 -1.07 -1.24 -0.85
## N.Valid 366.00 366.00 365.00 366.00 364.00 367.00 367.00 365.00 365.00
## Pct.Valid 99.73 99.73 99.46 99.73 99.18 100.00 100.00 99.46 99.46
##
## Table: Table continues below
##
##
##
## det cry pla met cou exp pre not arr
## ----------------- -------- -------- -------- -------- -------- -------- -------- -------- --------
## Mean 1.01 1.42 0.90 1.19 1.18 1.04 0.61 0.66 0.37
## Std.Dev 0.75 0.79 0.79 0.89 0.76 0.77 0.73 0.83 0.63
## Min 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## Q1 0.00 1.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00
## Median 1.00 2.00 1.00 2.00 1.00 1.00 0.00 0.00 0.00
## Q3 2.00 2.00 2.00 2.00 2.00 2.00 1.00 1.00 1.00
## Max 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
## MAD 1.48 0.00 1.48 0.00 1.48 1.48 0.00 0.00 0.00
## IQR 2.00 1.00 2.00 2.00 1.00 2.00 1.00 1.00 1.00
## CV 0.74 0.56 0.88 0.75 0.65 0.74 1.19 1.25 1.74
## Skewness -0.01 -0.89 0.18 -0.37 -0.32 -0.07 0.74 0.70 1.51
## SE.Skewness 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.13 0.13
## Kurtosis -1.21 -0.82 -1.40 -1.65 -1.23 -1.32 -0.79 -1.19 1.01
## N.Valid 361.00 365.00 367.00 365.00 365.00 365.00 366.00 367.00 367.00
## Pct.Valid 98.37 99.46 100.00 99.46 99.46 99.46 99.73 100.00 100.00
##
## Table: Table continues below
##
##
##
## cea rep
## ----------------- -------- --------
## Mean 1.00 0.26
## Std.Dev 0.85 0.44
## Min 0.00 0.00
## Q1 0.00 0.00
## Median 1.00 0.00
## Q3 2.00 1.00
## Max 2.00 1.00
## MAD 1.48 0.00
## IQR 2.00 1.00
## CV 0.85 1.71
## Skewness 0.01 1.11
## SE.Skewness 0.13 0.13
## Kurtosis -1.61 -0.76
## N.Valid 367.00 367.00
## Pct.Valid 100.00 100.00
For the further demonstration of new techniques, I selected only the first 5 items of the BSS and called the subset “Sub”. Note that we did not use this subset in the article. I selected these 5 for sake of parsimony.
Sub <- data[1:5]
To facilitate graphical interpretation, it is possible to colour the variables which you expect to cluster. In this study we wanted to see whether the more cognitive or motivational items of the BSS (such as frequency of suicide ideation) and the more action prone or volitional items (such as actual preparing of an attempt) cluster. Importantly, this grouping does not influences the placing of the nodes! It only gives the nodes that you group the same colour. You can also let a clustering algoritm search for clusters, for example via the package EGA. That is beyond this tutorial, more info can be found via a blog of Eiko Fried.
groups <- structure(list(Motivational = c(1,2,3,4,5,6,7,8, 11),
Volititional = c(9,10,12,13,14,15,16,17,18,19),
FutureAttempt = c(20)),
Names = c("motivational",
"volitional", "Repeat suicidal behaviour"))
Different kind of variables ask for different kind of estimation method. When data are continous and normally distributed, we use the gaussian graphical model. This can also be used for ordinal variables, when cormethod is set to “cor_auto”.
A so-called L1-regularization parameter is used to get a sparse network:
Network <-estimateNetwork(data,default = "EBICglasso" , threshold = FALSE, corMethod = "cor_auto")
## Warning in EBICglassoCore(S = S, n = n, gamma = gamma, penalize.diagonal
## = penalize.diagonal, : A dense regularized network was selected (lambda <
## 0.1 * lambda.max). Recent work indicates a possible drop in specificity.
## Interpret the presence of the smallest edges with care. Setting threshold =
## TRUE will enforce high specificity, at the cost of sensitivity.
The package qgraph is used to plot the network. One can use either layout = ‘spring’ or layout is ‘circle’. Spring places the nodes using the Furchterman Reingold alghoritm.
plot(Network, groups = groups, layout = 'spring')
A key concept in network analysis is the centrality of the symptom: if a symptom (e.g. fatigue) has many and/or strong associations to other symptoms, they are more central within the network than a less connected symptom.There are different centrality metrics, but strenght is most often used. Expected influence is related to strength but argued to be a better metric.
centralityPlot(Network, include = c("Strength","ExpectedInfluence"),
orderBy = "ExpectedInfluence")
A new function in qgraph lets you display the network from one node. For some reason, I have to explicitly estimate the network using qgraph. Will have to ask Sasha or Eiko about that.
g2 <- qgraph(cor_auto(data), graph = "glasso", sampleSize = nrow(data),
layout = "spring", theme = "colorblind",
cut = 0, groups = groups)
flow(g2, "rep", theme = "colorblind", vsize = 4)
In the latest version of bootnet, new estimation methods are available. Bellow I compared three: standard EBISglasso, a more conservative EBICglasso with a threshold, and unreguralized estimation.
Explanation of GLASSO can be found here.
layout(t(1:3))
g1 <- qgraph(cor_auto(Sub), graph = "glasso", sampleSize = nrow(Sub),
theme = "colorblind", title = "EBICglasso",
cut = 0)
g2 <- qgraph(cor_auto(Sub), graph = "glasso", sampleSize = nrow(Sub),
theme = "colorblind", title = "EBICglasso_threshold", threshold = TRUE,
cut = 0)
g3 <- estimateNetwork(Sub,
default = "ggmModSelect",
stepwise = TRUE,
corMethod = "cor")
plot(g3, title = "ggModSelect", layout = "circle")