Here are the libraries we need.

```
## load required libraries
# uncomment this if you haven't installed something
# install.packages()
library(knitr)
library(OpenMx)
library(beepr)
library(polycor)
```

We all remember the basic levels of measurement:

Nominal, where different numbers mean different categories (but nothing else).

Ordinal, where different numbers convey ordered categories (but not any information about spacing between categories).

Interval, where different numbers convey both order and spacing (but not anything about ratios).

Ratio, where different numbers convey order, spacing, and ratios.

Today is not about nominal data, which rarely happens in psychology or SEM. It is also not about ratio data, because you’re psychologists (except those of you who aren’t). Today is about how do deal with ordinal data in SEM (with the caveat that binary data is automatically ordinal). We typically call ordinal data “categorical”, while calling interval or ratio “continuous”.

Here’s the big question before we start:

Most psychological data is “intervalish”, where data are clearly ordinal, but it might not make that much of a difference if you treat the data as continuous. Rhemtulla (et al, 2012) has a simulation showing what happens when you pretend ordinal data are continuous under varying conditions.

The biggest issue is the number of categories:

“Most” agree that five or fewer categories is ordinal.

I’ve seen people argue that 10 or more categories is clearly continuous.

In reality, it’s context dependent.

Other factors to consider:

Are there really groups? If grouping is real, then the tools we’re about to use don’t make sense.

Do you need to do anything at all? We include binary exogenous variables in regression all the time with no theoretical problems.

Is the model you want to fit possible? The method we’re about to learn will not work with dozens or hundreds of categorical variable. “Slightly wrong but converges prior to graduation” is a useful model.

For the rest of today, we’re going to be approaching ordinal variables as though they are single indicators of a continuous latent variable. It will generally improve the accuracy of our results at a considerable computational cost. Then we’ll talk about other solutions, specifically alternatives to ML that are faster.

Let’s do this in reverse. We have a single item: “How happy are you at your job?” with is measured with an unanchored five-point likert scale. However, we hypothesize that job satisfaction is really continuous, and if we had a better instrument, we’d get continuous data. That continuous data might look something like this:

```
# death date of wojtek the bear,
# a brown bear that was drafted into the Polish army in WW2
# https://en.wikipedia.org/wiki/Wojtek_(bear)
set.seed(12021963)
# do you know who really liked his job? wojtek
jobSat <- rnorm(674, mean=3, sd=1.5)
hist(jobSat, breaks=60)
# but we only get numbers from one to five
hist(jobSat, breaks=60)
abline(v=c(1.5,2.5,3.5,4.5), lwd=3)
text(1:5, 33, 1:5, cex=2)
```

```
# so anyone who's true cores is lower than 1.5 will say 1 on a 1-5 scale.
# however, those thresholds could be anywhere, and they don't have to be evenly spaced
hist(jobSat, breaks=60)
abline(v=c(-.5, 2, 2.5, 4.1), lwd=3)
text(c(-2, 1, 2.2, 3.4, 5), 33, 1:5, cex=2)
```

The position of those lines are called *thresholds*, and they describe the score on the unmeasured continuous latent variable where you transition between observed categories. There are always \(k-1\) thresholds for \(k\) categories. We can freely estimate those thresholds if we treat the underlying latent variable like any other latent variable and make two constraints (most often mean and variance) for identification.

With complete data, these thresholds can be analytically derived. If 10% of scores are in the lowest category, then the first threshold is at qnorm(.10) = -1.2815516 standard deviations below the mean of the latent variable. If the mean and variance of the latent continuous variable are 0 and 1, then the threshold is just -1.2815516.

With missing data, the thresholds must be estimated, and FIML will give you better estimates of the thresholds due to the covariances between items. Put differently, if someone is missing on the job satisfaction item, but reports a 5 out of 5 on all other items from the same factor, then the estiamted threshold should account for the fact that this person likely scored a 5 (maybe a 4, definitely not a 1) on job satisfaction.

Covariances are always estimated, just like they were in polycor when we did EFA like 10 years ago. Let’s review/learn how this should work. If our data were continuous, we’d get this:

```
# job satisfaction exists. how about some measure of boss performance?
set.seed(12021963)
bossRespect <- .6 * jobSat + 1.3*rnorm(length(jobSat), 0, 1)
cor(jobSat, bossRespect)
```

`## [1] 1`

```
# plot them
plot(jobSat, bossRespect)
# categorize them
abline(v=c(0, 1.5, 3.25, 4), lwd=2)
abline(h=c(-1, 0.66, 1.9), lwd=2)
```

```
# actually code
jobSatCat <- 1 + (jobSat>0) + (jobSat>1.5) + (jobSat>3.25) + (jobSat>4)
bossRespectCat <- 1 + (bossRespect>(-1)) + (bossRespect>.66) + (bossRespect>1.9)
cor(jobSatCat, bossRespectCat)
```

`## [1] 0.8636247`

So we can see (a) the relationship between the continuous and ordinal versions, and (b) the attenuation of the correlation between these two varabies when we violate assumptions. Just like we have pnorm that shows the probability of specific values and ranges from the (univariate) normal distribution, we have multivariate versions that account for not only mean and variance, but also the covariate between the two variables.

```
# here's a frequency table
table(jobSatCat, bossRespectCat)
```

```
## bossRespectCat
## jobSatCat 1 2 3 4
## 1 20 0 0 0
## 2 64 53 0 0
## 3 0 97 140 27
## 4 0 0 0 124
## 5 0 0 0 149
```

```
# and here it is as percentages/proportions
length(jobSatCat)
```

`## [1] 674`

`table(jobSatCat, bossRespectCat)/length(jobSatCat)`

```
## bossRespectCat
## jobSatCat 1 2 3 4
## 1 0.02967359 0.00000000 0.00000000 0.00000000
## 2 0.09495549 0.07863501 0.00000000 0.00000000
## 3 0.00000000 0.14391691 0.20771513 0.04005935
## 4 0.00000000 0.00000000 0.00000000 0.18397626
## 5 0.00000000 0.00000000 0.00000000 0.22106825
```

```
# just need to see which correlations generate expected joint probabilities that best match up with this table
# thankfully, the library polycor and its function polychor (?) do this for us
polychor(jobSatCat, bossRespectCat)
```

```
## Warning in optimise(f, interval = c(-maxcor, maxcor)): NA/Inf replaced by
## maximum positive value
## Warning in optimise(f, interval = c(-maxcor, maxcor)): NA/Inf replaced by
## maximum positive value
## Warning in optimise(f, interval = c(-maxcor, maxcor)): NA/Inf replaced by
## maximum positive value
```

`## [1] 0.9828255`

Categorical SEM requires that we specify which variables are categorical and model their thresholds. First, we tell R that some of your data are ordinal categories. R calls these ``factors’’, because it’s always nice to have the same word mean different things in the same context. There are two ways to tell R something is a factor. First, there is a factor() function, which requires some arguments to specify both what the categories are and in what order. By default, R will put the categories in the order they first appear in the data.

As this is a minor pain to move around, someone wrote mxFactor, which takes two arguments: data, and what you want (in order) the categories to be.

```
# load old data
data(myFADataRaw)
# grab just the ordinal variables
oneFactorOrd <- myFADataRaw[,c("z1", "z2", "z3")]
summary(oneFactorOrd)
```

```
## z1 z2 z3
## Min. :0.00 Min. :0.00 Min. :0.000
## 1st Qu.:1.00 1st Qu.:0.00 1st Qu.:0.000
## Median :1.00 Median :0.00 Median :1.000
## Mean :0.82 Mean :0.48 Mean :0.782
## 3rd Qu.:1.00 3rd Qu.:1.00 3rd Qu.:1.000
## Max. :1.00 Max. :1.00 Max. :2.000
```

```
# for caroline
oneFactorOrd$z1[1:10] <- NA
# use mxFactor to specify that these are factors
oneFactorOrd$z1 <- mxFactor(oneFactorOrd$z1, levels=c(0, 1))
oneFactorOrd$z2 <- mxFactor(oneFactorOrd$z2, levels=c(0, 1))
oneFactorOrd$z3 <- mxFactor(oneFactorOrd$z3, levels=c(0, 1, 2))
# look at what's different!
summary(oneFactorOrd)
```

```
## z1 z2 z3
## 0 : 89 0:260 0:157
## 1 :401 1:240 1:295
## NA's: 10 2: 48
```

Now, whenever we enter these data in a model, we have to account for the fact that they are ordinal. We have to provide estimates for these thresholds, which we again do in two ways. First, with a matrix!

```
thresh <- mxMatrix( type="Full", nrow=2, ncol=3,
free=c(TRUE,TRUE,TRUE,FALSE,FALSE,TRUE),
values=c(-1,0,-.5,NA,NA,1.2), byrow=TRUE, name="thresh" )
```

Notes:

The threshold matrix is 2 x 3. There is a column for every categorical variable, and a row for every possible threshold.

Variable z3 needs two thresholds, so there are two rows. Variables z1 and z2 need one threshold, so they leave the second row empty. Note that these extra spots in the threshold matrix are always fixed, never free.

Other thresholds may be freely estimated.

Thresholds are strictly increasing.

No one cares how many categories you meant to collect. If you have a seven-point Likert, but no one says 5, you have a six-point likert.

Once you have this matrix, it is time to put it in the model. Just as we had A, S, F, and M, we provide a slot in the RAM and normal expectations for thresholds, as shown.

`mxExpectationRAM(A="A", S="S", F="F", M="M", thresholds="thresh")`

```
## MxExpectationRAM 'expectation'
## $A : 'A'
## $S : 'S'
## $F : 'F'
## $M : 'M'
## $dims : NA
## $thresholds : 'thresh'
```

Most of the time, you will use paths. Just as we use mxPath to draw paths, we use mxThreshold to create thresholds. First, let’s make a model:

```
simpCat <- mxModel("three categorical items",
type='RAM',
mxData(oneFactorOrd, "raw"),
manifestVars=c("z1", "z2", "z3")
)
simpCat
```

```
## MxModel 'three categorical items'
## type : RAM
## $matrices : 'A', 'S', and 'F'
## $algebras : NULL
## $constraints : NULL
## $intervals : NULL
## $latentVars : none
## $manifestVars : 'z1', 'z2', and 'z3'
## $data : 500 x 3
## $data means : NA
## $data type: 'raw'
## $submodels : NULL
## $expectation : MxExpectationRAM
## $fitfunction : MxFitFunctionML
## $compute : NULL
## $independent : FALSE
## $options :
## $output : FALSE
```

Note what is missing. M and F have not been created, and won’t be until we make paths/thresholds that reference them. First, let’s complete the rest of the model. I’m going to fix the item means/variances for identification. Then, I add the thresholds using mxThreshold.

```
simpCat <- mxModel("three categorical items",
type='RAM',
mxData(oneFactorOrd, "raw"),
manifestVars=c("z1", "z2", "z3"),
latentVars="factor",
# variances
mxPath(c("z1", "z2", "z3"), arrows=2, values=1, free=FALSE),
# covariances
#mxPath(c("z1", "z2", "z3"), connect="unique.bivariate",
# arrows=2, values=.5, free=TRUE,
# labels=c("c12", "c13", "c23")),
mxPath("factor", to=c("z1", "z2", "z3"),
arrows=1, values=.8,
free=TRUE,
labels=c("lambdaZ1", "lambdaZ2", "lambdaZ3")),
mxPath("factor", arrows=2, values=1, free=FALSE),
# means
mxPath("one", c("z1", "z2", "z3"), arrows=1, values=0, free=FALSE),
# thresholds
mxThreshold(c("z1", "z2", "z3"), nThresh = c(1, 1, 2), free=TRUE,
labels=c("z1t1", "z2t1", "z3t1", "z3t2"))
)
simpCat
```

```
## MxModel 'three categorical items'
## type : RAM
## $matrices : 'A', 'S', 'F', 'M', and 'Thresholds'
## $algebras : NULL
## $constraints : NULL
## $intervals : NULL
## $latentVars : 'factor'
## $manifestVars : 'z1', 'z2', and 'z3'
## $data : 500 x 3
## $data means : NA
## $data type: 'raw'
## $submodels : NULL
## $expectation : MxExpectationRAM
## $fitfunction : MxFitFunctionML
## $compute : NULL
## $independent : FALSE
## $options :
## $output : FALSE
```

`simpCat$M`

```
## FullMatrix 'M'
##
## $labels: No labels assigned.
##
## $values
## z1 z2 z3 factor
## [1,] 0 0 0 0
##
## $free: No free parameters.
##
## $lbound: No lower bounds assigned.
##
## $ubound: No upper bounds assigned.
```

`simpCat$Thresholds`

```
## FullMatrix 'Thresholds'
##
## $labels
## z1 z2 z3
## [1,] "z1t1" "z2t1" "z3t1"
## [2,] NA NA "z3t2"
##
## $values
## z1 z2 z3
## [1,] 0 0 -0.4307273
## [2,] 0 0 0.4307273
##
## $free
## z1 z2 z3
## [1,] TRUE TRUE TRUE
## [2,] FALSE FALSE TRUE
##
## $lbound: No lower bounds assigned.
##
## $ubound: No upper bounds assigned.
```

Notes:

mxThreshold creates a threshold matrix, called Threshold.

Note that free=TRUE specified only the valid thresholds as freely estimated.

If you do not give starting values (tsk tsk), omx will place them at the normal quantiles assuming all categories are evenly used. So, binary puts the single threshold at 0, ternary puts them at plus/minus .43, etc.

You can and should use labels.

How’d this go?

```
# run the model
threshOut <- mxRun(simpCat)
```

`## Running three categorical items with 7 parameters`

`summary(threshOut)`

```
## Summary of three categorical items
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 lambdaZ1 A 1 4 1.47355556 0.29849911
## 2 lambdaZ2 A 2 4 1.29946007 0.21447096
## 3 lambdaZ3 A 3 4 1.60887975 0.32256699
## 4 z1t1 Thresholds 1 1 -1.60575353 0.24045528
## 5 z2t1 Thresholds 1 2 0.07616229 0.09193602
## 6 z3t1 Thresholds 1 3 -0.92062395 0.16955756
## 7 z3t2 Thresholds 2 3 2.48378670 0.37855822
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 7 1483 1813.581
## Saturated: 9 1481 NA
## Independence: 6 1484 NA
## Number of observations/statistics: 500/1490
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: -1152.419 1827.581 1827.809
## BIC: -7402.682 1857.084 1834.865
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2022-02-05 09:33:25
## Wall clock time: 0.04426789 secs
## optimizer: CSOLNP
## OpenMx version number: 2.14.11
## Need help? See help(mxSummary)
```

`mxStandardizeRAMpaths(threshOut)`

```
## name label matrix row col Raw.Value
## 1 three categorical items.A[1,4] lambdaZ1 A z1 factor 1.473556
## 2 three categorical items.A[2,4] lambdaZ2 A z2 factor 1.299460
## 3 three categorical items.A[3,4] lambdaZ3 A z3 factor 1.608880
## 4 three categorical items.S[1,1] <NA> S z1 z1 1.000000
## 5 three categorical items.S[2,2] <NA> S z2 z2 1.000000
## 6 three categorical items.S[3,3] <NA> S z3 z3 1.000000
## 7 three categorical items.S[4,4] <NA> S factor factor 1.000000
## Raw.SE Std.Value Std.SE
## 1 not_requested 0.8274530 not_requested
## 2 not_requested 0.7925016 not_requested
## 3 not_requested 0.8493124 not_requested
## 4 not_requested 0.3153215 not_requested
## 5 not_requested 0.3719413 not_requested
## 6 not_requested 0.2786684 not_requested
## 7 not_requested 1.0000000 not_requested
```

Historically, you’ve had to switch between continuous data ML models and categorical data models. However, modern tools allow you to mix and match continuous and ordinal data. Here we’re adding several of the continuous X variables from our demo data.

```
# get the data
oneFactorJoint <- myFADataRaw[,c("x1","x2","x3","z1","z2","z3")]
# make the ordinal variables
oneFactorJoint$z1 <- mxFactor(oneFactorOrd$z1, levels=c(0,1))
oneFactorJoint$z2 <- mxFactor(oneFactorOrd$z2, levels=c(0,1))
oneFactorJoint$z3 <- mxFactor(oneFactorOrd$z3, levels=c(0,1,2))
# make life easier
xVar <- c("x1","x2","x3")
zVar <- c("z1","z2","z3")
simpJoint <- mxModel("One Factor Six Items",
type='RAM',
mxData(oneFactorJoint, "raw"),
manifestVars=c(xVar, zVar),
latentVars="f",
## variances
# variances: z are fixed
mxPath(zVar, arrows=2, values=1, free=FALSE),
# variances: x are free
mxPath(xVar, arrows=2, values=1, free=TRUE, labels=c("e1", "e2", "e3")),
# variances: factor is fixed
mxPath("f", arrows=2, values=1, free=FALSE),
## loadings:
mxPath("f", c(xVar, zVar), values=.6, free=TRUE,
labels=paste("lambda", c(xVar, zVar), sep="_")),
## means: z is fixed, x is free
mxPath("one", zVar, arrows=1, values=0, free=FALSE),
mxPath("one", xVar, arrows=1, values=0, free=TRUE, labels=c("m1", "m2", "m3")),
# thresholds
mxThreshold(zVar, nThresh = c(1, 1, 2), free=TRUE)
)
simpJoint
```

```
## MxModel 'One Factor Six Items'
## type : RAM
## $matrices : 'A', 'S', 'F', 'M', and 'Thresholds'
## $algebras : NULL
## $constraints : NULL
## $intervals : NULL
## $latentVars : 'f'
## $manifestVars : 'x1', 'x2', 'x3', 'z1', 'z2', and 'z3'
## $data : 500 x 6
## $data means : NA
## $data type: 'raw'
## $submodels : NULL
## $expectation : MxExpectationRAM
## $fitfunction : MxFitFunctionML
## $compute : NULL
## $independent : FALSE
## $options :
## $output : FALSE
```

`jointRes <- mxRun(simpJoint)`

`## Running One Factor Six Items with 16 parameters`

`jointRefModels <- mxRefModels(jointRes, run=TRUE)`

`## Running Saturated One Factor Six Items with 25 parameters`

`## Running Independence One Factor Six Items with 10 parameters`

`summary(jointRes, refModels=jointRefModels)`

```
## Summary of One Factor Six Items
##
## free parameters:
## name matrix row col Estimate
## 1 lambda_x1 A 1 7 -0.80432360
## 2 lambda_x2 A 2 7 -0.79324082
## 3 lambda_x3 A 3 7 -0.75608369
## 4 lambda_z1 A 4 7 -0.28209557
## 5 lambda_z2 A 5 7 -0.23699108
## 6 lambda_z3 A 6 7 -0.21029422
## 7 e1 S 1 1 0.34801624
## 8 e2 S 2 2 0.39330374
## 9 e3 S 3 3 0.41004905
## 10 m1 M 1 1 2.98813184
## 11 m2 M 1 2 3.01152140
## 12 m3 M 1 3 2.98628305
## 13 One Factor Six Items.Thresholds[1,1] Thresholds 1 1 -0.94213284
## 14 One Factor Six Items.Thresholds[1,2] Thresholds 1 2 0.05139727
## 15 One Factor Six Items.Thresholds[1,3] Thresholds 1 3 -0.49386745
## 16 One Factor Six Items.Thresholds[2,3] Thresholds 2 3 1.32970654
## Std.Error A
## 1 0.04115534
## 2 0.04192112
## 3 0.04126926
## 4 0.08023232
## 5 0.06524504
## 6 0.05896615
## 7 0.03742011
## 8 0.03854482
## 9 0.03730083
## 10 0.04461955
## 11 0.04523319
## 12 0.04432049
## 13 0.07015752
## 14 0.05762763
## 15 0.05999381
## 16 0.07905732
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 16 2974 5715.821
## Saturated: 25 2965 5480.597
## Independence: 10 2980 6313.382
## Number of observations/statistics: 500/2990
##
## chi-square: χ² ( df=9 ) = 235.2244, p = 1.305043e-45
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: -232.1791 5747.821 5748.947
## BIC: -12766.4236 5815.255 5764.470
## CFI: 0.7233695
## TLI: 0.5389491 (also known as NNFI)
## RMSEA: 0.2242144 [95% CI (0.1952074, 0.25416)]
## Prob(RMSEA <= 0.05): 0
## timestamp: 2022-02-05 09:33:27
## Wall clock time: 1.489409 secs
## optimizer: CSOLNP
## OpenMx version number: 2.14.11
## Need help? See help(mxSummary)
```

That fits poorly. Maybe it’s two factors?

```
simpJoint2 <- mxModel("Two Factor Six Items",
type='RAM',
mxData(oneFactorJoint, "raw"),
manifestVars=c(xVar, zVar),
latentVars=c("facX", "facZ"),
## variances
# variances: z are fixed
mxPath(zVar, arrows=2, values=1, free=FALSE),
# variances: x are free
mxPath(xVar, arrows=2, values=1, free=TRUE, labels=c("e1", "e2", "e3")),
# variances: factor is fixed
mxPath("facX", arrows=2, values=1, free=FALSE),
mxPath("facZ", arrows=2, values=1, free=FALSE),
# factor covaraince is estimated
mxPath("facX", "facZ", arrows=2, values=.5, free=TRUE, label="facCov"),
## loadings:
mxPath("facX", xVar, values=.6, free=TRUE,
labels=paste("lambda", xVar, sep="_")),
mxPath("facZ", zVar, values=.6, free=TRUE,
labels=paste("lambda", zVar, sep="_")),
## means: z is fixed, x is free
mxPath("one", zVar, arrows=1, values=0, free=FALSE),
mxPath("one", xVar, arrows=1, values=0, free=TRUE, labels=c("m1", "m2", "m3")),
# thresholds
mxThreshold(zVar, nThresh = c(1, 1, 2), free=TRUE)
)
simpJoint2
```

```
## MxModel 'Two Factor Six Items'
## type : RAM
## $matrices : 'A', 'S', 'F', 'M', and 'Thresholds'
## $algebras : NULL
## $constraints : NULL
## $intervals : NULL
## $latentVars : 'facX' and 'facZ'
## $manifestVars : 'x1', 'x2', 'x3', 'z1', 'z2', and 'z3'
## $data : 500 x 6
## $data means : NA
## $data type: 'raw'
## $submodels : NULL
## $expectation : MxExpectationRAM
## $fitfunction : MxFitFunctionML
## $compute : NULL
## $independent : FALSE
## $options :
## $output : FALSE
```

`jointRes2 <- mxRun(simpJoint2)`

`## Running Two Factor Six Items with 17 parameters`

`summary(jointRes2, refModels=jointRefModels)`

```
## Summary of Two Factor Six Items
##
## free parameters:
## name matrix row col Estimate
## 1 lambda_x1 A 1 7 0.80561005
## 2 lambda_x2 A 2 7 0.79578208
## 3 lambda_x3 A 3 7 0.75899629
## 4 lambda_z1 A 4 8 1.47471526
## 5 lambda_z2 A 5 8 1.32812671
## 6 lambda_z3 A 6 8 1.54612092
## 7 e1 S 1 1 0.34625304
## 8 e2 S 2 2 0.38956509
## 9 e3 S 3 3 0.40590823
## 10 facCov S 7 8 0.21776577
## 11 m1 M 1 1 2.98791303
## 12 m2 M 1 2 3.01130545
## 13 m3 M 1 3 2.98607725
## 14 Two Factor Six Items.Thresholds[1,1] Thresholds 1 1 -1.60652409
## 15 Two Factor Six Items.Thresholds[1,2] Thresholds 1 2 0.08043287
## 16 Two Factor Six Items.Thresholds[1,3] Thresholds 1 3 -0.89157299
## 17 Two Factor Six Items.Thresholds[2,3] Thresholds 2 3 2.40920499
## Std.Error A
## 1 0.04132511
## 2 0.04201742
## 3 0.04135989
## 4 0.29376983
## 5 0.21863098
## 6 0.29055359
## 7 0.03782537
## 8 0.03875186
## 9 0.03745735
## 10 0.05718369
## 11 0.04461541
## 12 0.04522921
## 13 0.04431680
## 14 0.23767099
## 15 0.09330719
## 16 0.15604905
## 17 0.33926001
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 17 2973 5493.192
## Saturated: 25 2965 5480.597
## Independence: 10 2980 6313.382
## Number of observations/statistics: 500/2990
##
## chi-square: χ² ( df=8 ) = 12.59555, p = 0.1265438
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: -452.8079 5527.192 5528.462
## BIC: -12982.8378 5598.840 5544.881
## CFI: 0.9943805
## TLI: 0.9894634 (also known as NNFI)
## RMSEA: 0.03389526 [95% CI (0, 0.07336028)]
## Prob(RMSEA <= 0.05): 0.7476977
## timestamp: 2022-02-05 09:33:29
## Wall clock time: 0.9125268 secs
## optimizer: CSOLNP
## OpenMx version number: 2.14.11
## Need help? See help(mxSummary)
```

The primary issue with ordinal FIML is that it is computationally intensive. At each major and minor iteration, the expected covariance matrix must be integrated to generate expected probabilities for each person’s responses. This is incredibly expensive, and grows exponentially as the number of variables increases because integration requires additional matrix inversions.

As a general rule, each additional ordinal variable increases processing time by up to a factor of 8. How bad is that? If a ten-variable model can be estimated in 1 second:

12 variables will take 64 seconds, or just over a minute,

14 variables will take approximately 68 minutes, or 1:08 hours,

16 variables will take approximately 72.82 hours, or just over 3 days, and

20 variables will take 298,261 hours, or approximately 34 years.

Thankfully, most 10 variable models take less than 1 second (or at least the integration part). The algorithm that OpenMx uses for multivariate normal integration is hard-capped at 20 variables. If you have more than 20 ordinal variables, you must use another tool.

So the big question here is “does this really matter?” Sometimes it will, and sometimes it won’t. The best way to check is to run things both ways and see what changes. Here’s the one factor model again, this time ignoring any categorical data assumptions.

```
# did he forget about the bear joke thing?
# get the data
oneFactorIgnorant <- myFADataRaw[,c("x1","x2","x3","z1","z2","z3")]
# make life easier
xVar <- c("x1","x2","x3")
zVar <- c("z1","z2","z3")
simpIngorant <- mxModel("One Factor Six Items No Assumptions",
type='RAM',
mxData(oneFactorIgnorant, "raw"),
manifestVars=c(xVar, zVar),
latentVars="f",
## variances
# variances: x are free
mxPath(c(xVar, zVar), arrows=2, values=1, free=TRUE, labels=c("e1", "e2", "e3", "e4", "e5", "e6")),
# variances: factor is fixed
mxPath("f", arrows=2, values=1, free=FALSE),
# loadings:
mxPath("f", c(xVar, zVar), values=.6, free=TRUE,
labels=paste("lambda", c(xVar, zVar), sep="_")),
# means: z is fixed, x is free
mxPath("one", zVar, arrows=1, values=0, free=TRUE, labels=c("m4", "m5", "m6")),
mxPath("one", xVar, arrows=1, values=0, free=TRUE, labels=c("m1", "m2", "m3"))
)
ignorantRes <- mxRun(simpIngorant)
```

`## Running One Factor Six Items No Assumptions with 18 parameters`

`ignorantRefModels <- mxRefModels(ignorantRes, run=TRUE)`

`## Running Saturated One Factor Six Items No Assumptions with 27 parameters`

`## Running Independence One Factor Six Items No Assumptions with 12 parameters`

`summary(ignorantRes, refModels=ignorantRefModels)`

```
## Summary of One Factor Six Items No Assumptions
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 lambda_x1 A 1 7 0.80465798 0.041211658
## 2 lambda_x2 A 2 7 0.79386836 0.041971757
## 3 lambda_x3 A 3 7 0.75617502 0.041316129
## 4 lambda_z1 A 4 7 0.06461368 0.018837732
## 5 lambda_z2 A 5 7 0.09076110 0.024504197
## 6 lambda_z3 A 6 7 0.10755475 0.029662385
## 7 e1 S 1 1 0.34779789 0.037471792
## 8 e2 S 2 2 0.39261872 0.038590883
## 9 e3 S 3 3 0.41019338 0.037324557
## 10 e4 S 4 4 0.14342500 0.009129564
## 11 e5 S 5 5 0.24136236 0.015382806
## 12 e6 S 6 6 0.35090791 0.022366233
## 13 m1 M 1 1 2.98795172 0.044615488
## 14 m2 M 1 2 3.01134367 0.045229292
## 15 m3 M 1 3 2.98611381 0.044316873
## 16 m4 M 1 4 0.81999994 0.017181381
## 17 m5 M 1 5 0.47999985 0.022342804
## 18 m6 M 1 6 0.78199995 0.026924962
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 18 2982 5759.513
## Saturated: 27 2973 5529.256
## Independence: 12 2988 6355.500
## Number of observations/statistics: 500/3000
##
## chi-square: χ² ( df=9 ) = 230.2564, p = 1.452904e-44
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: -204.4873 5795.513 5796.935
## BIC: -12772.4487 5871.376 5814.243
## CFI: 0.7272628
## TLI: 0.545438 (also known as NNFI)
## RMSEA: 0.2217388 [95% CI (0.1927306, 0.2516949)]
## Prob(RMSEA <= 0.05): 0
## timestamp: 2022-02-05 09:33:29
## Wall clock time: 0.0297761 secs
## optimizer: CSOLNP
## OpenMx version number: 2.14.11
## Need help? See help(mxSummary)
```

```
# what changed?
summary(jointRes, refModels=jointRefModels)$CFI
```

`## [1] 0.7233695`

`summary(ignorantRes, refModels=ignorantRefModels)$CFI`

`## [1] 0.7272628`

Ignoring things gave us mostly the right answer. Why? There’s only one threshold for two of the three variables. I’ll never tell you to ignore categorical data, but I will tell you that it’s uncommon to find one factor while ignoring assumptions and three with correct specification. Here’s Ryne’s unofficial guide to categorical data models if I’m dead-set on using ML.

1-8ish categorical variables: just run the model with the correct specification. The added time is fairly trivial.

9-13ish categorical variables: run one version of the model once. Get a beverage. Come back, figure out how much time that took, and make a decision.

12ish-20ish categorical variables: ignore categorical issues to start, either with raw data, a polychoric correlation matrix out of hetcor, or WLS (which I’ll teach you momentarily), depending on the missing data pattern. Make most of my decisions on the imperfect models, then refit the final models with the correct specification, potentially on a grid or other large computing system.

Remember that you don’t always have to use this. Binary and nominal variables are totally fine to use as exogenous variables in regression, and this is just (crazy) regression.

You’ll be tempted to use this for things like level of education when there are 15 categories.

Don’t use this when you can’t assume an underlying latent distribution. This includes nominal variables (race) or when the categories aren’t clearly ordered (education: HS/college/masters/MD/PhD/JD).

It is (sometimes) ok to combine categories. In the education example, you can state that a PhD is just as good as an MD or JD like you all just did at thanksgiving.

Great time for questions.

Let’s pretend for a second that you actually have tons of categorical data, and you want to stick with SEM. Weighted Least Squares is an alternative to ML. So remember that for ML, we used this to compare expected to observed covariance matrix:

\[ ML = (n−1)*(log(det(C))+trace(S*C−1)−log(det(S))−nvar) \]

where \(S\) is your observed covariance matrix and \(C\) is the expected covariane matrix from your model. Raw data (FIML) versions generate the likelihood of each row by getting the normal density for each person, logging, and summing.

ML is great when you can assume multivariate normality and thus can rely on likelihood ratio (\(\chi^2\)) tests. However, Browne (1984) and many others have pointed out some limitations of this. Among them, ML is only chi-square distributed as the sample size increases, with no real value under which we’re absolutely sure it takes hold. Additionally, categorical data are complex and have a (relative) lack of information compared to continuous variables, and there tend to be many many covariances estimated once we get to item-level categorical data.

For example, a 30-item scale with 5-point likerts would estimate \(\frac{30*29}{2} = 435\) covariances and 120 thresholds. If you want ten people per parameter (which is a rule of thumb I don’t endorse), that’s a sample size of n=(435+120)*10 = 5,550, which is ridiculous for validating a survey and out of range for most of you. For that reason, people have looked towards asymptotically distribution free (ADF) methods, mostly centering around least squares. They tend to look something like this:

\[ LeastSquares = \sum{(S - C)^2} \]

or equivalently,

\[ Least squares = \sum((s_i - c_i)^2) = \sum((s_i - c_i)(s_i - c_i)) \]

where we take the difference between the observed and expected paramters, square them, and add them up. This gives us a nice residual based method that works with categorical, continuous, and any other data you have. This is a *least squares* estimator. That works really well…unless there are really big differences in variances/covariances.

```
# imagine this is our data
obsCor <- matrix(.5, 3, 3)
diag(obsCor) <- 1
obsCor
```

```
## [,1] [,2] [,3]
## [1,] 1.0 0.5 0.5
## [2,] 0.5 1.0 0.5
## [3,] 0.5 0.5 1.0
```

```
# wls will work fine for this. but what if there are scaling differences?
obsSDs <- diag(c(10, 1, .1))
# this is like standardizing, but in reverse
obsCov <- obsSDs %*% obsCor %*% obsSDs
obsCov
```

```
## [,1] [,2] [,3]
## [1,] 100.0 5.00 0.50
## [2,] 5.0 1.00 0.05
## [3,] 0.5 0.05 0.01
```

```
# in case you don't trust me
cov2cor(obsCov)
```

```
## [,1] [,2] [,3]
## [1,] 1.0 0.5 0.5
## [2,] 0.5 1.0 0.5
## [3,] 0.5 0.5 1.0
```

If we apply a basic least squares estimator to the above covariance matrix, we’re effectively deciding that variable 1 is 100-10,000 times more important than variable 3. There are many things you could try (i.e., standardizing your data), but there’s a more general solution: weight each element of \(S\) and \(C\) to correct for scaling differences. This is *weighted least squares* estimation, or simply WLS. It’s fit function is here:

\[ WLS = (s - c)W^{-1}(s - c) \]

where \(s\) and \(c\) are our observed and expected parameters turned into a vector, and \(W\) is a matrix of weights that correct for this scaling differences.

We can swap out what our weight matrix \(W\) in a few ways:

If W is identity, then this is unweighted least squares, or ULS.

If we want a simple correction, we make the weights proportional to the standard errors of each parameter (which we can get out of hetcor or whatever polychoric estimator we have). If one variance is 100 and the other is .01 from the same data, their standard errors should similarly be off. When we make a diagonal matrix of the sampling variance (\(SE^2\)), this is

*diagonally weighted least squares*, called either DLS or DWLS.If we’re really fancy, we can generate not only the sampling variance of each parameter, but all sampling covariances. We never use sampling covariances, but when we calculate standard errors, we generate this entire sampling matrix (called an asymptotic covariance matrix), then take the diagonals to generate standard errors. This ventures into a lot of math, and we’re around line 450 for a 75 minute lecture, so we’ll stop there. This full weight matrix should give slightly better estimates as long as you have enough data to estimate not only the standard errors/sampling variances, but all covariances as well. This is full

*weighted least squares*or WLS.

Which should you use? WLS if possible, DLS if not.

Back in my day, WLS was hard. You had to make the acov by yourself. Now, you swap out mxFitFunctionML for mxFitFunctionWLS and everything is done for you.

```
?mxFitFunctionWLS()
# update your model with the new fit function
joint2WLS <- mxModel(jointRes,
mxFitFunctionWLS()
)
# run it
wlsRes <- mxRun(joint2WLS)
```

`## Running One Factor Six Items with 16 parameters`

`summary(wlsRes)`

```
## Summary of One Factor Six Items
##
## free parameters:
## name matrix row col Estimate
## 1 lambda_x1 A 1 7 -0.5367832
## 2 lambda_x2 A 2 7 -0.4458025
## 3 lambda_x3 A 3 7 -0.4486570
## 4 lambda_z1 A 4 7 -1.3583865
## 5 lambda_z2 A 5 7 -1.1771761
## 6 lambda_z3 A 6 7 -1.4898833
## 7 e1 S 1 1 0.3318199
## 8 e2 S 2 2 0.4052815
## 9 e3 S 3 3 0.3985408
## 10 m1 M 1 1 3.0047124
## 11 m2 M 1 2 3.0238321
## 12 m3 M 1 3 2.9543414
## 13 One Factor Six Items.Thresholds[1,1] Thresholds 1 1 -1.4735133
## 14 One Factor Six Items.Thresholds[1,2] Thresholds 1 2 0.1102402
## 15 One Factor Six Items.Thresholds[1,3] Thresholds 1 3 -0.8110467
## 16 One Factor Six Items.Thresholds[2,3] Thresholds 2 3 2.3614460
## Std.Error
## 1 0.04481988
## 2 0.04326880
## 3 0.04508364
## 4 0.22895182
## 5 0.15760630
## 6 0.21504629
## 7 0.03461112
## 8 0.03501293
## 9 0.03440987
## 10 0.04446026
## 11 0.04513189
## 12 0.04385230
## 13 0.18451417
## 14 0.08603092
## 15 0.13132228
## 16 0.25709873
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (r'Wr units)
## Model: 16 2974 175.95
## Saturated: 27 2963 0.00
## Independence: 12 2978 NA
## Number of observations/statistics: 500/2990
##
## chi-square: χ² ( df=9 ) = 175.95, p = 3.54794e-33
## CFI: NA
## TLI: NA (also known as NNFI)
## RMSEA: 0.1926136 [95% CI (0.1635804, 0.2227175)]
## Prob(RMSEA <= 0.05): 0
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2022-02-05 09:33:29
## Wall clock time: 0.02041411 secs
## optimizer: CSOLNP
## OpenMx version number: 2.14.11
## Need help? See help(mxSummary)
```

```
# what on earth is in the data?
# this is commented out because it prints a whole dataset
#wlsRes$data
```

Notes:

There is a lot more stuff in your data. ACM and the weights are all there.

There are options to mxFitFunctionWLS.

If your data are large, you may want to take the data object from your fitted WLS model and use it elsewhere to save on the recomputation of the ACM.

Pros include:

Your model with 30 categorical variables will converge before you die.

Scales very well with the number of variables.

When your model is good, it works almost as well as ML, and in seconds.

Did I mention it’s faster?

Works well when we violate normality assumptions, specifically highly skewed or non-normal data.

You get to read one of the million ML vs WLS papers out there.

Cons include:

There are many more methods for generating weight matrices, which were developed by throwing those letter magnets that were on your parent’s fridge at a wall.

As specified, we can’t do chi-square tests, but there are corrections.

Similarly, all fit stats are (in theory) slightly off, especially for ULS. That’s OK for relative fit, but not for absolute.

Covariance matrices and ACMs are build piece by piece (just like cor), and thus aren’t guaranteed to be positive definite.

- This is doubly true for the WLS full weight matrix. Remember our example with 555 parameters? The ACM is 555 by 555 (and symmetric), yielding 555 variances and \(\frac{555*554}{2} = 153,735\) covariances, built pairwise. There are going to be some problems with finite samples.