Preamble

Here are the libraries we need.

## load required libraries

# uncomment this if you haven't installed something
# install.packages()
library(knitr)
library(OpenMx)
library(beepr)
library(polycor)

Categorical Theory

We all remember the basic levels of measurement:

Today is not about nominal data, which rarely happens in psychology or SEM. It is also not about ratio data, because you’re psychologists (except those of you who aren’t). Today is about how do deal with ordinal data in SEM (with the caveat that binary data is automatically ordinal). We typically call ordinal data “categorical”, while calling interval or ratio “continuous”.

Here’s the big question before we start:

Most psychological data is “intervalish”, where data are clearly ordinal, but it might not make that much of a difference if you treat the data as continuous. Rhemtulla (et al, 2012) has a simulation showing what happens when you pretend ordinal data are continuous under varying conditions.

The biggest issue is the number of categories:

For the rest of today, we’re going to be approaching ordinal variables as though they are single indicators of a continuous latent variable. It will generally improve the accuracy of our results at a considerable computational cost. Then we’ll talk about other solutions, specifically alternatives to ML that are faster.

The actual theory underneath ordinal SEM

Let’s do this in reverse. We have a single item: “How happy are you at your job?” with is measured with an unanchored five-point likert scale. However, we hypothesize that job satisfaction is really continuous, and if we had a better instrument, we’d get continuous data. That continuous data might look something like this:

# death date of wojtek the bear,
#  a brown bear that was drafted into the Polish army in WW2
#  https://en.wikipedia.org/wiki/Wojtek_(bear)
set.seed(12021963)

# do you know who really liked his job? wojtek
jobSat <- rnorm(674, mean=3, sd=1.5)
hist(jobSat, breaks=60)

# but we only get numbers from one to five
hist(jobSat, breaks=60)
abline(v=c(1.5,2.5,3.5,4.5), lwd=3)
text(1:5, 33, 1:5, cex=2)

# so anyone who's true cores is lower than 1.5 will say 1 on a 1-5 scale.

# however, those thresholds could be anywhere, and they don't have to be evenly spaced
hist(jobSat, breaks=60)
abline(v=c(-.5, 2, 2.5, 4.1), lwd=3)
text(c(-2, 1, 2.2, 3.4, 5), 33, 1:5, cex=2)

The position of those lines are called thresholds, and they describe the score on the unmeasured continuous latent variable where you transition between observed categories. There are always \(k-1\) thresholds for \(k\) categories. We can freely estimate those thresholds if we treat the underlying latent variable like any other latent variable and make two constraints (most often mean and variance) for identification.

With complete data, these thresholds can be analytically derived. If 10% of scores are in the lowest category, then the first threshold is at qnorm(.10) = -1.2815516 standard deviations below the mean of the latent variable. If the mean and variance of the latent continuous variable are 0 and 1, then the threshold is just -1.2815516.

With missing data, the thresholds must be estimated, and FIML will give you better estimates of the thresholds due to the covariances between items. Put differently, if someone is missing on the job satisfaction item, but reports a 5 out of 5 on all other items from the same factor, then the estiamted threshold should account for the fact that this person likely scored a 5 (maybe a 4, definitely not a 1) on job satisfaction.

Covariances are always estimated, just like they were in polycor when we did EFA like 10 years ago. Let’s review/learn how this should work. If our data were continuous, we’d get this:

# job satisfaction exists. how about some measure of boss performance?
set.seed(12021963)
bossRespect <- .6 * jobSat + 1.3*rnorm(length(jobSat), 0, 1)
cor(jobSat, bossRespect)
## [1] 1
# plot them
plot(jobSat, bossRespect)

# categorize them
abline(v=c(0, 1.5, 3.25, 4), lwd=2)
abline(h=c(-1, 0.66, 1.9), lwd=2)

# actually code
jobSatCat <- 1 + (jobSat>0) + (jobSat>1.5) + (jobSat>3.25) + (jobSat>4)
bossRespectCat <- 1 + (bossRespect>(-1)) + (bossRespect>.66) + (bossRespect>1.9)

cor(jobSatCat, bossRespectCat)
## [1] 0.8636247

So we can see (a) the relationship between the continuous and ordinal versions, and (b) the attenuation of the correlation between these two varabies when we violate assumptions. Just like we have pnorm that shows the probability of specific values and ranges from the (univariate) normal distribution, we have multivariate versions that account for not only mean and variance, but also the covariate between the two variables.

# here's a frequency table
table(jobSatCat, bossRespectCat)
##          bossRespectCat
## jobSatCat   1   2   3   4
##         1  20   0   0   0
##         2  64  53   0   0
##         3   0  97 140  27
##         4   0   0   0 124
##         5   0   0   0 149
# and here it is as percentages/proportions
length(jobSatCat)
## [1] 674
table(jobSatCat, bossRespectCat)/length(jobSatCat)
##          bossRespectCat
## jobSatCat          1          2          3          4
##         1 0.02967359 0.00000000 0.00000000 0.00000000
##         2 0.09495549 0.07863501 0.00000000 0.00000000
##         3 0.00000000 0.14391691 0.20771513 0.04005935
##         4 0.00000000 0.00000000 0.00000000 0.18397626
##         5 0.00000000 0.00000000 0.00000000 0.22106825
# just need to see which correlations generate expected joint probabilities that best match up with this table

# thankfully, the library polycor and its function polychor (?) do this for us
polychor(jobSatCat, bossRespectCat)
## Warning in optimise(f, interval = c(-maxcor, maxcor)): NA/Inf replaced by
## maximum positive value

## Warning in optimise(f, interval = c(-maxcor, maxcor)): NA/Inf replaced by
## maximum positive value

## Warning in optimise(f, interval = c(-maxcor, maxcor)): NA/Inf replaced by
## maximum positive value
## [1] 0.9828255

Implementing Ordinal/Categorical SEM

Categorical SEM requires that we specify which variables are categorical and model their thresholds. First, we tell R that some of your data are ordinal categories. R calls these ``factors’’, because it’s always nice to have the same word mean different things in the same context. There are two ways to tell R something is a factor. First, there is a factor() function, which requires some arguments to specify both what the categories are and in what order. By default, R will put the categories in the order they first appear in the data.

As this is a minor pain to move around, someone wrote mxFactor, which takes two arguments: data, and what you want (in order) the categories to be.

# load old data
data(myFADataRaw)

# grab just the ordinal variables
oneFactorOrd <- myFADataRaw[,c("z1", "z2", "z3")]
summary(oneFactorOrd)
##        z1             z2             z3       
##  Min.   :0.00   Min.   :0.00   Min.   :0.000  
##  1st Qu.:1.00   1st Qu.:0.00   1st Qu.:0.000  
##  Median :1.00   Median :0.00   Median :1.000  
##  Mean   :0.82   Mean   :0.48   Mean   :0.782  
##  3rd Qu.:1.00   3rd Qu.:1.00   3rd Qu.:1.000  
##  Max.   :1.00   Max.   :1.00   Max.   :2.000
# for caroline
oneFactorOrd$z1[1:10] <- NA

# use mxFactor to specify that these are factors
oneFactorOrd$z1 <- mxFactor(oneFactorOrd$z1, levels=c(0, 1))
oneFactorOrd$z2 <- mxFactor(oneFactorOrd$z2, levels=c(0, 1))
oneFactorOrd$z3 <- mxFactor(oneFactorOrd$z3, levels=c(0, 1, 2))

# look at what's different!
summary(oneFactorOrd)
##     z1      z2      z3     
##  0   : 89   0:260   0:157  
##  1   :401   1:240   1:295  
##  NA's: 10           2: 48

Matrix Specification

Now, whenever we enter these data in a model, we have to account for the fact that they are ordinal. We have to provide estimates for these thresholds, which we again do in two ways. First, with a matrix!

thresh       <- mxMatrix( type="Full", nrow=2, ncol=3,
                          free=c(TRUE,TRUE,TRUE,FALSE,FALSE,TRUE),
                          values=c(-1,0,-.5,NA,NA,1.2), byrow=TRUE, name="thresh" )

Notes:

  • The threshold matrix is 2 x 3. There is a column for every categorical variable, and a row for every possible threshold.

  • Variable z3 needs two thresholds, so there are two rows. Variables z1 and z2 need one threshold, so they leave the second row empty. Note that these extra spots in the threshold matrix are always fixed, never free.

  • Other thresholds may be freely estimated.

  • Thresholds are strictly increasing.

  • No one cares how many categories you meant to collect. If you have a seven-point Likert, but no one says 5, you have a six-point likert.

Once you have this matrix, it is time to put it in the model. Just as we had A, S, F, and M, we provide a slot in the RAM and normal expectations for thresholds, as shown.

mxExpectationRAM(A="A", S="S", F="F", M="M", thresholds="thresh")
## MxExpectationRAM 'expectation' 
## $A : 'A' 
## $S : 'S' 
## $F : 'F' 
## $M : 'M' 
## $dims : NA 
## $thresholds : 'thresh'

Trying this out with paths

Most of the time, you will use paths. Just as we use mxPath to draw paths, we use mxThreshold to create thresholds. First, let’s make a model:

simpCat <- mxModel("three categorical items",
  type='RAM',
  mxData(oneFactorOrd, "raw"),
  manifestVars=c("z1", "z2", "z3")
  )
simpCat
## MxModel 'three categorical items' 
## type : RAM 
## $matrices : 'A', 'S', and 'F' 
## $algebras : NULL 
## $constraints : NULL 
## $intervals : NULL 
## $latentVars : none
## $manifestVars : 'z1', 'z2', and 'z3' 
## $data : 500 x 3 
## $data means : NA
## $data type: 'raw' 
## $submodels : NULL 
## $expectation : MxExpectationRAM 
## $fitfunction : MxFitFunctionML 
## $compute : NULL 
## $independent : FALSE 
## $options :  
## $output : FALSE

Note what is missing. M and F have not been created, and won’t be until we make paths/thresholds that reference them. First, let’s complete the rest of the model. I’m going to fix the item means/variances for identification. Then, I add the thresholds using mxThreshold.

simpCat <- mxModel("three categorical items",
  type='RAM',
  mxData(oneFactorOrd, "raw"),
  manifestVars=c("z1", "z2", "z3"),
  latentVars="factor",
  # variances
  mxPath(c("z1", "z2", "z3"), arrows=2, values=1, free=FALSE),
  # covariances
  #mxPath(c("z1", "z2", "z3"), connect="unique.bivariate",
  #       arrows=2, values=.5, free=TRUE,
   #      labels=c("c12", "c13", "c23")),
  mxPath("factor", to=c("z1", "z2", "z3"), 
         arrows=1, values=.8,
         free=TRUE, 
         labels=c("lambdaZ1", "lambdaZ2", "lambdaZ3")),
  mxPath("factor", arrows=2, values=1, free=FALSE),
  # means
  mxPath("one", c("z1", "z2", "z3"), arrows=1, values=0, free=FALSE),
  # thresholds
  mxThreshold(c("z1", "z2", "z3"), nThresh = c(1, 1, 2), free=TRUE,
              labels=c("z1t1",  "z2t1", "z3t1", "z3t2"))
  )
simpCat
## MxModel 'three categorical items' 
## type : RAM 
## $matrices : 'A', 'S', 'F', 'M', and 'Thresholds' 
## $algebras : NULL 
## $constraints : NULL 
## $intervals : NULL 
## $latentVars : 'factor' 
## $manifestVars : 'z1', 'z2', and 'z3' 
## $data : 500 x 3 
## $data means : NA
## $data type: 'raw' 
## $submodels : NULL 
## $expectation : MxExpectationRAM 
## $fitfunction : MxFitFunctionML 
## $compute : NULL 
## $independent : FALSE 
## $options :  
## $output : FALSE
simpCat$M
## FullMatrix 'M' 
## 
## $labels: No labels assigned.
## 
## $values
##      z1 z2 z3 factor
## [1,]  0  0  0      0
## 
## $free: No free parameters.
## 
## $lbound: No lower bounds assigned.
## 
## $ubound: No upper bounds assigned.
simpCat$Thresholds
## FullMatrix 'Thresholds' 
## 
## $labels
##      z1     z2     z3    
## [1,] "z1t1" "z2t1" "z3t1"
## [2,] NA     NA     "z3t2"
## 
## $values
##      z1 z2         z3
## [1,]  0  0 -0.4307273
## [2,]  0  0  0.4307273
## 
## $free
##         z1    z2   z3
## [1,]  TRUE  TRUE TRUE
## [2,] FALSE FALSE TRUE
## 
## $lbound: No lower bounds assigned.
## 
## $ubound: No upper bounds assigned.

Notes:

  • mxThreshold creates a threshold matrix, called Threshold.

  • Note that free=TRUE specified only the valid thresholds as freely estimated.

  • If you do not give starting values (tsk tsk), omx will place them at the normal quantiles assuming all categories are evenly used. So, binary puts the single threshold at 0, ternary puts them at plus/minus .43, etc.

  • You can and should use labels.

How’d this go?

# run the model
threshOut <- mxRun(simpCat)
## Running three categorical items with 7 parameters
summary(threshOut)
## Summary of three categorical items 
##  
## free parameters:
##       name     matrix row col    Estimate  Std.Error A
## 1 lambdaZ1          A   1   4  1.47355556 0.29849911  
## 2 lambdaZ2          A   2   4  1.29946007 0.21447096  
## 3 lambdaZ3          A   3   4  1.60887975 0.32256699  
## 4     z1t1 Thresholds   1   1 -1.60575353 0.24045528  
## 5     z2t1 Thresholds   1   2  0.07616229 0.09193602  
## 6     z3t1 Thresholds   1   3 -0.92062395 0.16955756  
## 7     z3t2 Thresholds   2   3  2.48378670 0.37855822  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              7                   1483              1813.581
##    Saturated:              9                   1481                    NA
## Independence:              6                   1484                    NA
## Number of observations/statistics: 500/1490
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:      -1152.419               1827.581                 1827.809
## BIC:      -7402.682               1857.084                 1834.865
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2022-02-05 09:33:25 
## Wall clock time: 0.04426789 secs 
## optimizer:  CSOLNP 
## OpenMx version number: 2.14.11 
## Need help?  See help(mxSummary)
mxStandardizeRAMpaths(threshOut)
##                             name    label matrix    row    col Raw.Value
## 1 three categorical items.A[1,4] lambdaZ1      A     z1 factor  1.473556
## 2 three categorical items.A[2,4] lambdaZ2      A     z2 factor  1.299460
## 3 three categorical items.A[3,4] lambdaZ3      A     z3 factor  1.608880
## 4 three categorical items.S[1,1]     <NA>      S     z1     z1  1.000000
## 5 three categorical items.S[2,2]     <NA>      S     z2     z2  1.000000
## 6 three categorical items.S[3,3]     <NA>      S     z3     z3  1.000000
## 7 three categorical items.S[4,4]     <NA>      S factor factor  1.000000
##          Raw.SE Std.Value        Std.SE
## 1 not_requested 0.8274530 not_requested
## 2 not_requested 0.7925016 not_requested
## 3 not_requested 0.8493124 not_requested
## 4 not_requested 0.3153215 not_requested
## 5 not_requested 0.3719413 not_requested
## 6 not_requested 0.2786684 not_requested
## 7 not_requested 1.0000000 not_requested

Joint Ordinal-Continuous ML

Historically, you’ve had to switch between continuous data ML models and categorical data models. However, modern tools allow you to mix and match continuous and ordinal data. Here we’re adding several of the continuous X variables from our demo data.

# get the data
oneFactorJoint <- myFADataRaw[,c("x1","x2","x3","z1","z2","z3")]

# make the ordinal variables
oneFactorJoint$z1 <- mxFactor(oneFactorOrd$z1, levels=c(0,1))
oneFactorJoint$z2 <- mxFactor(oneFactorOrd$z2, levels=c(0,1))
oneFactorJoint$z3 <- mxFactor(oneFactorOrd$z3, levels=c(0,1,2))

# make life easier
xVar <- c("x1","x2","x3")
zVar <- c("z1","z2","z3")

simpJoint <- mxModel("One Factor Six Items",
  type='RAM',
  mxData(oneFactorJoint, "raw"),
  manifestVars=c(xVar, zVar),
  latentVars="f",
  ## variances
  # variances: z are fixed
  mxPath(zVar, arrows=2, values=1, free=FALSE),
  # variances: x are free
  mxPath(xVar, arrows=2, values=1, free=TRUE, labels=c("e1", "e2", "e3")),
  # variances: factor is fixed
  mxPath("f", arrows=2, values=1, free=FALSE),
  ## loadings:
  mxPath("f", c(xVar, zVar), values=.6, free=TRUE,
        labels=paste("lambda", c(xVar, zVar), sep="_")),
  ## means: z is fixed, x is free
  mxPath("one", zVar, arrows=1, values=0, free=FALSE),
  mxPath("one", xVar, arrows=1, values=0, free=TRUE, labels=c("m1", "m2", "m3")),
  # thresholds
  mxThreshold(zVar, nThresh = c(1, 1, 2), free=TRUE)
  )
simpJoint
## MxModel 'One Factor Six Items' 
## type : RAM 
## $matrices : 'A', 'S', 'F', 'M', and 'Thresholds' 
## $algebras : NULL 
## $constraints : NULL 
## $intervals : NULL 
## $latentVars : 'f' 
## $manifestVars : 'x1', 'x2', 'x3', 'z1', 'z2', and 'z3' 
## $data : 500 x 6 
## $data means : NA
## $data type: 'raw' 
## $submodels : NULL 
## $expectation : MxExpectationRAM 
## $fitfunction : MxFitFunctionML 
## $compute : NULL 
## $independent : FALSE 
## $options :  
## $output : FALSE
jointRes <- mxRun(simpJoint)
## Running One Factor Six Items with 16 parameters
jointRefModels <- mxRefModels(jointRes, run=TRUE)
## Running Saturated One Factor Six Items with 25 parameters
## Running Independence One Factor Six Items with 10 parameters
summary(jointRes, refModels=jointRefModels)
## Summary of One Factor Six Items 
##  
## free parameters:
##                                    name     matrix row col    Estimate
## 1                             lambda_x1          A   1   7 -0.80432360
## 2                             lambda_x2          A   2   7 -0.79324082
## 3                             lambda_x3          A   3   7 -0.75608369
## 4                             lambda_z1          A   4   7 -0.28209557
## 5                             lambda_z2          A   5   7 -0.23699108
## 6                             lambda_z3          A   6   7 -0.21029422
## 7                                    e1          S   1   1  0.34801624
## 8                                    e2          S   2   2  0.39330374
## 9                                    e3          S   3   3  0.41004905
## 10                                   m1          M   1   1  2.98813184
## 11                                   m2          M   1   2  3.01152140
## 12                                   m3          M   1   3  2.98628305
## 13 One Factor Six Items.Thresholds[1,1] Thresholds   1   1 -0.94213284
## 14 One Factor Six Items.Thresholds[1,2] Thresholds   1   2  0.05139727
## 15 One Factor Six Items.Thresholds[1,3] Thresholds   1   3 -0.49386745
## 16 One Factor Six Items.Thresholds[2,3] Thresholds   2   3  1.32970654
##     Std.Error A
## 1  0.04115534  
## 2  0.04192112  
## 3  0.04126926  
## 4  0.08023232  
## 5  0.06524504  
## 6  0.05896615  
## 7  0.03742011  
## 8  0.03854482  
## 9  0.03730083  
## 10 0.04461955  
## 11 0.04523319  
## 12 0.04432049  
## 13 0.07015752  
## 14 0.05762763  
## 15 0.05999381  
## 16 0.07905732  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:             16                   2974              5715.821
##    Saturated:             25                   2965              5480.597
## Independence:             10                   2980              6313.382
## Number of observations/statistics: 500/2990
## 
## chi-square:  χ² ( df=9 ) = 235.2244,  p = 1.305043e-45
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:      -232.1791               5747.821                 5748.947
## BIC:    -12766.4236               5815.255                 5764.470
## CFI: 0.7233695 
## TLI: 0.5389491   (also known as NNFI) 
## RMSEA:  0.2242144  [95% CI (0.1952074, 0.25416)]
## Prob(RMSEA <= 0.05): 0
## timestamp: 2022-02-05 09:33:27 
## Wall clock time: 1.489409 secs 
## optimizer:  CSOLNP 
## OpenMx version number: 2.14.11 
## Need help?  See help(mxSummary)

That fits poorly. Maybe it’s two factors?

In class assignment

simpJoint2 <- mxModel("Two Factor Six Items",
  type='RAM',
  mxData(oneFactorJoint, "raw"),
  manifestVars=c(xVar, zVar),
  latentVars=c("facX", "facZ"),
  ## variances
  # variances: z are fixed
  mxPath(zVar, arrows=2, values=1, free=FALSE),
  # variances: x are free
  mxPath(xVar, arrows=2, values=1, free=TRUE, labels=c("e1", "e2", "e3")),
  # variances: factor is fixed
  mxPath("facX", arrows=2, values=1, free=FALSE),
  mxPath("facZ", arrows=2, values=1, free=FALSE),
  # factor covaraince is estimated
  mxPath("facX", "facZ", arrows=2, values=.5, free=TRUE, label="facCov"),
  ## loadings:
  mxPath("facX", xVar, values=.6, free=TRUE,
        labels=paste("lambda", xVar, sep="_")),
  mxPath("facZ", zVar, values=.6, free=TRUE,
        labels=paste("lambda", zVar, sep="_")),
  ## means: z is fixed, x is free
  mxPath("one", zVar, arrows=1, values=0, free=FALSE),
  mxPath("one", xVar, arrows=1, values=0, free=TRUE, labels=c("m1", "m2", "m3")),
  # thresholds
  mxThreshold(zVar, nThresh = c(1, 1, 2), free=TRUE)
  )
simpJoint2
## MxModel 'Two Factor Six Items' 
## type : RAM 
## $matrices : 'A', 'S', 'F', 'M', and 'Thresholds' 
## $algebras : NULL 
## $constraints : NULL 
## $intervals : NULL 
## $latentVars : 'facX' and 'facZ' 
## $manifestVars : 'x1', 'x2', 'x3', 'z1', 'z2', and 'z3' 
## $data : 500 x 6 
## $data means : NA
## $data type: 'raw' 
## $submodels : NULL 
## $expectation : MxExpectationRAM 
## $fitfunction : MxFitFunctionML 
## $compute : NULL 
## $independent : FALSE 
## $options :  
## $output : FALSE
jointRes2 <- mxRun(simpJoint2)
## Running Two Factor Six Items with 17 parameters
summary(jointRes2, refModels=jointRefModels)
## Summary of Two Factor Six Items 
##  
## free parameters:
##                                    name     matrix row col    Estimate
## 1                             lambda_x1          A   1   7  0.80561005
## 2                             lambda_x2          A   2   7  0.79578208
## 3                             lambda_x3          A   3   7  0.75899629
## 4                             lambda_z1          A   4   8  1.47471526
## 5                             lambda_z2          A   5   8  1.32812671
## 6                             lambda_z3          A   6   8  1.54612092
## 7                                    e1          S   1   1  0.34625304
## 8                                    e2          S   2   2  0.38956509
## 9                                    e3          S   3   3  0.40590823
## 10                               facCov          S   7   8  0.21776577
## 11                                   m1          M   1   1  2.98791303
## 12                                   m2          M   1   2  3.01130545
## 13                                   m3          M   1   3  2.98607725
## 14 Two Factor Six Items.Thresholds[1,1] Thresholds   1   1 -1.60652409
## 15 Two Factor Six Items.Thresholds[1,2] Thresholds   1   2  0.08043287
## 16 Two Factor Six Items.Thresholds[1,3] Thresholds   1   3 -0.89157299
## 17 Two Factor Six Items.Thresholds[2,3] Thresholds   2   3  2.40920499
##     Std.Error A
## 1  0.04132511  
## 2  0.04201742  
## 3  0.04135989  
## 4  0.29376983  
## 5  0.21863098  
## 6  0.29055359  
## 7  0.03782537  
## 8  0.03875186  
## 9  0.03745735  
## 10 0.05718369  
## 11 0.04461541  
## 12 0.04522921  
## 13 0.04431680  
## 14 0.23767099  
## 15 0.09330719  
## 16 0.15604905  
## 17 0.33926001  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:             17                   2973              5493.192
##    Saturated:             25                   2965              5480.597
## Independence:             10                   2980              6313.382
## Number of observations/statistics: 500/2990
## 
## chi-square:  χ² ( df=8 ) = 12.59555,  p = 0.1265438
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:      -452.8079               5527.192                 5528.462
## BIC:    -12982.8378               5598.840                 5544.881
## CFI: 0.9943805 
## TLI: 0.9894634   (also known as NNFI) 
## RMSEA:  0.03389526  [95% CI (0, 0.07336028)]
## Prob(RMSEA <= 0.05): 0.7476977
## timestamp: 2022-02-05 09:33:29 
## Wall clock time: 0.9125268 secs 
## optimizer:  CSOLNP 
## OpenMx version number: 2.14.11 
## Need help?  See help(mxSummary)

Problems with ordinal ML

The primary issue with ordinal FIML is that it is computationally intensive. At each major and minor iteration, the expected covariance matrix must be integrated to generate expected probabilities for each person’s responses. This is incredibly expensive, and grows exponentially as the number of variables increases because integration requires additional matrix inversions.

As a general rule, each additional ordinal variable increases processing time by up to a factor of 8. How bad is that? If a ten-variable model can be estimated in 1 second:

Thankfully, most 10 variable models take less than 1 second (or at least the integration part). The algorithm that OpenMx uses for multivariate normal integration is hard-capped at 20 variables. If you have more than 20 ordinal variables, you must use another tool.

When not to use this

So the big question here is “does this really matter?” Sometimes it will, and sometimes it won’t. The best way to check is to run things both ways and see what changes. Here’s the one factor model again, this time ignoring any categorical data assumptions.

# did he forget about the bear joke thing?
# get the data
oneFactorIgnorant <- myFADataRaw[,c("x1","x2","x3","z1","z2","z3")]

# make life easier
xVar <- c("x1","x2","x3")
zVar <- c("z1","z2","z3")

simpIngorant <- mxModel("One Factor Six Items No Assumptions",
  type='RAM',
  mxData(oneFactorIgnorant, "raw"),
  manifestVars=c(xVar, zVar),
  latentVars="f",
  ## variances
  # variances: x are free
  mxPath(c(xVar, zVar), arrows=2, values=1, free=TRUE, labels=c("e1", "e2", "e3", "e4", "e5", "e6")),
  # variances: factor is fixed
  mxPath("f", arrows=2, values=1, free=FALSE),
  # loadings:
  mxPath("f", c(xVar, zVar), values=.6, free=TRUE,
        labels=paste("lambda", c(xVar, zVar), sep="_")),
  # means: z is fixed, x is free
  mxPath("one", zVar, arrows=1, values=0, free=TRUE, labels=c("m4", "m5", "m6")),
  mxPath("one", xVar, arrows=1, values=0, free=TRUE, labels=c("m1", "m2", "m3"))
  )

ignorantRes <- mxRun(simpIngorant)
## Running One Factor Six Items No Assumptions with 18 parameters
ignorantRefModels <- mxRefModels(ignorantRes, run=TRUE)
## Running Saturated One Factor Six Items No Assumptions with 27 parameters
## Running Independence One Factor Six Items No Assumptions with 12 parameters
summary(ignorantRes, refModels=ignorantRefModels)
## Summary of One Factor Six Items No Assumptions 
##  
## free parameters:
##         name matrix row col   Estimate   Std.Error A
## 1  lambda_x1      A   1   7 0.80465798 0.041211658  
## 2  lambda_x2      A   2   7 0.79386836 0.041971757  
## 3  lambda_x3      A   3   7 0.75617502 0.041316129  
## 4  lambda_z1      A   4   7 0.06461368 0.018837732  
## 5  lambda_z2      A   5   7 0.09076110 0.024504197  
## 6  lambda_z3      A   6   7 0.10755475 0.029662385  
## 7         e1      S   1   1 0.34779789 0.037471792  
## 8         e2      S   2   2 0.39261872 0.038590883  
## 9         e3      S   3   3 0.41019338 0.037324557  
## 10        e4      S   4   4 0.14342500 0.009129564  
## 11        e5      S   5   5 0.24136236 0.015382806  
## 12        e6      S   6   6 0.35090791 0.022366233  
## 13        m1      M   1   1 2.98795172 0.044615488  
## 14        m2      M   1   2 3.01134367 0.045229292  
## 15        m3      M   1   3 2.98611381 0.044316873  
## 16        m4      M   1   4 0.81999994 0.017181381  
## 17        m5      M   1   5 0.47999985 0.022342804  
## 18        m6      M   1   6 0.78199995 0.026924962  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:             18                   2982              5759.513
##    Saturated:             27                   2973              5529.256
## Independence:             12                   2988              6355.500
## Number of observations/statistics: 500/3000
## 
## chi-square:  χ² ( df=9 ) = 230.2564,  p = 1.452904e-44
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:      -204.4873               5795.513                 5796.935
## BIC:    -12772.4487               5871.376                 5814.243
## CFI: 0.7272628 
## TLI: 0.545438   (also known as NNFI) 
## RMSEA:  0.2217388  [95% CI (0.1927306, 0.2516949)]
## Prob(RMSEA <= 0.05): 0
## timestamp: 2022-02-05 09:33:29 
## Wall clock time: 0.0297761 secs 
## optimizer:  CSOLNP 
## OpenMx version number: 2.14.11 
## Need help?  See help(mxSummary)
# what changed?

summary(jointRes, refModels=jointRefModels)$CFI
## [1] 0.7233695
summary(ignorantRes, refModels=ignorantRefModels)$CFI
## [1] 0.7272628

Ignoring things gave us mostly the right answer. Why? There’s only one threshold for two of the three variables. I’ll never tell you to ignore categorical data, but I will tell you that it’s uncommon to find one factor while ignoring assumptions and three with correct specification. Here’s Ryne’s unofficial guide to categorical data models if I’m dead-set on using ML.

  • 1-8ish categorical variables: just run the model with the correct specification. The added time is fairly trivial.

  • 9-13ish categorical variables: run one version of the model once. Get a beverage. Come back, figure out how much time that took, and make a decision.

  • 12ish-20ish categorical variables: ignore categorical issues to start, either with raw data, a polychoric correlation matrix out of hetcor, or WLS (which I’ll teach you momentarily), depending on the missing data pattern. Make most of my decisions on the imperfect models, then refit the final models with the correct specification, potentially on a grid or other large computing system.

Other notes

  • Remember that you don’t always have to use this. Binary and nominal variables are totally fine to use as exogenous variables in regression, and this is just (crazy) regression.

  • You’ll be tempted to use this for things like level of education when there are 15 categories.

  • Don’t use this when you can’t assume an underlying latent distribution. This includes nominal variables (race) or when the categories aren’t clearly ordered (education: HS/college/masters/MD/PhD/JD).

  • It is (sometimes) ok to combine categories. In the education example, you can state that a PhD is just as good as an MD or JD like you all just did at thanksgiving.

  • Great time for questions.

WLS

Let’s pretend for a second that you actually have tons of categorical data, and you want to stick with SEM. Weighted Least Squares is an alternative to ML. So remember that for ML, we used this to compare expected to observed covariance matrix:

\[ ML = (n−1)*(log(det(C))+trace(S*C−1)−log(det(S))−nvar) \]

where \(S\) is your observed covariance matrix and \(C\) is the expected covariane matrix from your model. Raw data (FIML) versions generate the likelihood of each row by getting the normal density for each person, logging, and summing.

ML is great when you can assume multivariate normality and thus can rely on likelihood ratio (\(\chi^2\)) tests. However, Browne (1984) and many others have pointed out some limitations of this. Among them, ML is only chi-square distributed as the sample size increases, with no real value under which we’re absolutely sure it takes hold. Additionally, categorical data are complex and have a (relative) lack of information compared to continuous variables, and there tend to be many many covariances estimated once we get to item-level categorical data.

For example, a 30-item scale with 5-point likerts would estimate \(\frac{30*29}{2} = 435\) covariances and 120 thresholds. If you want ten people per parameter (which is a rule of thumb I don’t endorse), that’s a sample size of n=(435+120)*10 = 5,550, which is ridiculous for validating a survey and out of range for most of you. For that reason, people have looked towards asymptotically distribution free (ADF) methods, mostly centering around least squares. They tend to look something like this:

\[ LeastSquares = \sum{(S - C)^2} \]

or equivalently,

\[ Least squares = \sum((s_i - c_i)^2) = \sum((s_i - c_i)(s_i - c_i)) \]

where we take the difference between the observed and expected paramters, square them, and add them up. This gives us a nice residual based method that works with categorical, continuous, and any other data you have. This is a least squares estimator. That works really well…unless there are really big differences in variances/covariances.

# imagine this is our data
obsCor <- matrix(.5, 3, 3)
diag(obsCor) <- 1
obsCor
##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.5
## [2,]  0.5  1.0  0.5
## [3,]  0.5  0.5  1.0
# wls will work fine for this. but what if there are scaling differences?
obsSDs <- diag(c(10, 1, .1))

# this is like standardizing, but in reverse
obsCov <- obsSDs %*% obsCor %*% obsSDs
obsCov
##       [,1] [,2] [,3]
## [1,] 100.0 5.00 0.50
## [2,]   5.0 1.00 0.05
## [3,]   0.5 0.05 0.01
# in case you don't trust me
cov2cor(obsCov)
##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.5
## [2,]  0.5  1.0  0.5
## [3,]  0.5  0.5  1.0

If we apply a basic least squares estimator to the above covariance matrix, we’re effectively deciding that variable 1 is 100-10,000 times more important than variable 3. There are many things you could try (i.e., standardizing your data), but there’s a more general solution: weight each element of \(S\) and \(C\) to correct for scaling differences. This is weighted least squares estimation, or simply WLS. It’s fit function is here:

\[ WLS = (s - c)W^{-1}(s - c) \]

where \(s\) and \(c\) are our observed and expected parameters turned into a vector, and \(W\) is a matrix of weights that correct for this scaling differences.

We can swap out what our weight matrix \(W\) in a few ways:

Which should you use? WLS if possible, DLS if not.

Implementing WLS

Back in my day, WLS was hard. You had to make the acov by yourself. Now, you swap out mxFitFunctionML for mxFitFunctionWLS and everything is done for you.

?mxFitFunctionWLS()

# update your model with the new fit function
joint2WLS <- mxModel(jointRes,
  mxFitFunctionWLS()
  )
# run it
wlsRes <- mxRun(joint2WLS)
## Running One Factor Six Items with 16 parameters
summary(wlsRes)
## Summary of One Factor Six Items 
##  
## free parameters:
##                                    name     matrix row col   Estimate
## 1                             lambda_x1          A   1   7 -0.5367832
## 2                             lambda_x2          A   2   7 -0.4458025
## 3                             lambda_x3          A   3   7 -0.4486570
## 4                             lambda_z1          A   4   7 -1.3583865
## 5                             lambda_z2          A   5   7 -1.1771761
## 6                             lambda_z3          A   6   7 -1.4898833
## 7                                    e1          S   1   1  0.3318199
## 8                                    e2          S   2   2  0.4052815
## 9                                    e3          S   3   3  0.3985408
## 10                                   m1          M   1   1  3.0047124
## 11                                   m2          M   1   2  3.0238321
## 12                                   m3          M   1   3  2.9543414
## 13 One Factor Six Items.Thresholds[1,1] Thresholds   1   1 -1.4735133
## 14 One Factor Six Items.Thresholds[1,2] Thresholds   1   2  0.1102402
## 15 One Factor Six Items.Thresholds[1,3] Thresholds   1   3 -0.8110467
## 16 One Factor Six Items.Thresholds[2,3] Thresholds   2   3  2.3614460
##     Std.Error
## 1  0.04481988
## 2  0.04326880
## 3  0.04508364
## 4  0.22895182
## 5  0.15760630
## 6  0.21504629
## 7  0.03461112
## 8  0.03501293
## 9  0.03440987
## 10 0.04446026
## 11 0.04513189
## 12 0.04385230
## 13 0.18451417
## 14 0.08603092
## 15 0.13132228
## 16 0.25709873
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (r'Wr units)
##        Model:             16                   2974               175.95
##    Saturated:             27                   2963                 0.00
## Independence:             12                   2978                   NA
## Number of observations/statistics: 500/2990
## 
## chi-square:  χ² ( df=9 ) = 175.95,  p = 3.54794e-33
## CFI: NA 
## TLI: NA   (also known as NNFI) 
## RMSEA:  0.1926136  [95% CI (0.1635804, 0.2227175)]
## Prob(RMSEA <= 0.05): 0
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2022-02-05 09:33:29 
## Wall clock time: 0.02041411 secs 
## optimizer:  CSOLNP 
## OpenMx version number: 2.14.11 
## Need help?  See help(mxSummary)
# what on earth is in the data?
# this is commented out because it prints a whole dataset
#wlsRes$data

Notes:

  • There is a lot more stuff in your data. ACM and the weights are all there.

  • There are options to mxFitFunctionWLS.

  • If your data are large, you may want to take the data object from your fitted WLS model and use it elsewhere to save on the recomputation of the ACM.

Pros and Cons of WLS

Pros include:

  • Your model with 30 categorical variables will converge before you die.

  • Scales very well with the number of variables.

  • When your model is good, it works almost as well as ML, and in seconds.

  • Did I mention it’s faster?

  • Works well when we violate normality assumptions, specifically highly skewed or non-normal data.

  • You get to read one of the million ML vs WLS papers out there.

Cons include:

  • There are many more methods for generating weight matrices, which were developed by throwing those letter magnets that were on your parent’s fridge at a wall.

  • As specified, we can’t do chi-square tests, but there are corrections.

  • Similarly, all fit stats are (in theory) slightly off, especially for ULS. That’s OK for relative fit, but not for absolute.

  • Covariance matrices and ACMs are build piece by piece (just like cor), and thus aren’t guaranteed to be positive definite.

    • This is doubly true for the WLS full weight matrix. Remember our example with 555 parameters? The ACM is 555 by 555 (and symmetric), yielding 555 variances and \(\frac{555*554}{2} = 153,735\) covariances, built pairwise. There are going to be some problems with finite samples.