Refactor Sem multi-group support by alyst · Pull Request #317 · StructuralEquationModels/StructuralEquationModels.jl

alyst · 2026-03-09T22:47:46Z

This is a largest remaining part of #193, which changes some interfaces.

Refactoring of the SEM types

AbstractLoss is the base type for all functions
SemLoss{O,I} <: AbstractLoss is the base type for all SEM losses, it now requires to have observed::O and implied::I field
Since SemLoss ctor should always be given observed and implied (positional), meanstructure keyword is gone -- loss should always respect implied specification.
LossTerm is a thin wrapper around AbstractLoss that adds optional id of the loss term and optional weight
Sem is a container of LossTerm objects (accessible via loss_terms(sem), or loss_term(sem, id)), so it can handle multiple SEM terms (accessible via sem_terms(sem) -- subset of loss_terms(sem), or sem_term(sem, id)).
It replaces both the old Sem and SemEnsemble.
AbstractSingleSem, AbstractSemCollection and SemEnsemble are gone.

Method changes

Multi-term SEMs could be created like

model = Sem(
    :Pasteur => SemML(obs_g1, RAMSymbolic(specification_g1)),
    :Grant_White => SemML(obs_g2, RAM(specification_g2)),
    ...
)

Or with weights specification

model = Sem(
    :Pasteur => SemML(obs_g1, RAMSymbolic(specification_g1)) => 0.5,
    :Grant_White => SemML(obs_g2, RAM(specification_g2)) => 0.6,
)

The new Sem() and loss-term constructors rely less on keyword arguments and more on positional arguments, but some keywords support is present.

update_observed!() was removed. It was only used by replace_observed(),
but otherwise in-place model modification with unclear semantics is error-prone.
replace_observed(sem, data) was simplified by removing support of additional keywords or requirement to pass SEM specification.
It only creates a copy of the given Sem with the observed data replaced,
but implied and loss definitions intact.
Changing observed vars is not supported -- that is something use-case specific
that user should implement in their code.
check_single_lossfun() was renamed into check_same_semterm_type() as
it better describes what it does. If check is successful, it returns the specific
subtype of SemLoss.
bootstrap() and se_bootstrap() use bootstrap!(acc::BootstrapAccumulator, ...)
function to reduce code duplication
bootstrap() returns BootstrapResult{T} for better type inference
fit_measures() now also accepts vector of functions, and includes CFI by default (DEFAULT_FIT_MEASURES constant)
test_fitmeasures() was tweaked to handle more repetitive code: calculating the subset of fit measures, and compairing this subset against lavaan refs, checking for measures that could not be applied to given loss types (SemWLS).

- for SemImplied require spec::SemSpec as positional - for SemLossFunction require implied argument

Maximilian-Stefan-Ernst · 2026-03-24T16:25:45Z

docs/src/tutorials/collection/collection.md

+In this case, [`FiniteDiffWrapper`](@ref) method to generate a wrapper around the specific `SemLoss` term that only uses its objective
+to calculate the gradient using the finite difference approximation.


Suggested change

In this case, [`FiniteDiffWrapper`](@ref) method to generate a wrapper around the specific `SemLoss` term that only uses its objective

to calculate the gradient using the finite difference approximation.

In this case, [`FiniteDiffWrapper`](@ref) can be used to generate a wrapper around the specific `SemLoss` term. This wrapper only uses the `LossTerm`s objective, and calculates the gradient using finite difference approximation.

alyst · 2026-03-24T17:06:17Z

@Maximilian-Stefan-Ernst It might be a nice idea to use copilot for catching typos, incorrect sentences, but also potential bugs.
I cannot select copilot as a reviewer -- I'm not exactly sure why, whether it is the organization/repository-level setting, or it's my status in the repository.
But I'm also fine if SEM.jl is kept AI-free :)

src/frontend/fit/fitmeasures/chi2.jl

Maximilian-Stefan-Ernst · 2026-03-25T12:54:12Z

src/frontend/fit/fitmeasures/chi2.jl

-############################################################################################
+function χ²(fit::SemFit, model::AbstractSem)
+    terms = sem_terms(model)
+    isempty(terms) && return 0.0


Maybe we should throw an error for a Sem with no terms?

I think the Sem constructor should throw an exception if there are no SEM terms.
Returning 0 here seems legit if there is no data, but we can also change it into @assert.
Throwing an exception in situations like this will just add a lot of redundant code.

src/frontend/fit/fitmeasures/chi2.jl

Maximilian-Stefan-Ernst · 2026-03-25T21:58:43Z

src/frontend/specification/Sem.jl

+        # FIXME remove this implicit logic
+        # SemWLS only accepts vech-ed implied covariance
+        if isa(loss, Type) && (loss <: SemWLS) && !haskey(kwargs, :vech)
+            implied_kwargs = copy(kwargs)
+            implied_kwargs[:vech] = true
+        else
+            implied_kwargs = kwargs
+        end


I think before this was handled inside RAMSymbolic - is there a reason to move it here?

My original idea was to move away from passing the kwargs... around.
It causes various issues, adds maintenance overhead, and interferes with the modularity/extensibility goal.
Instead, the constructors should support their specific keywords only.
This PR still retains kwargs... passthrough for compatibility, but moves all "parameter compatibility" logic to the Sem construction.

With the old logic the user only specifies the implied and loss types, and SEM.jl figures how to tweak their parameters to make them compatible.
RAMSymbolic/SemWLS is one such example, where RAMSymbolic decides how to construct itself knowing how it will be used.
In this PR, the implied constructors are agnostic of the loss functions they will be used in.
The loss constructor will throw an exception if the implied object is not compatible.

In principle the logic here could be removed -- the user should be able to easily fix the SEM construction based on the error message thrown by SemWLS.

That sounds reasonable - I think the error message in SemWLS is good, and we can remove the logic here.

src/loss/abstract.jl

deduplicate the correction scale methods and move to Sem.jl

remove update_observed!()

to suppress info about inv(obs_cov)

also add CFI to the list

Maximilian-Stefan-Ernst · 2026-03-29T19:59:32Z

src/objective_gradient_hessian.jl

+    for term in loss_terms(model)
+        issemloss(term) && update!(targets, implied(term), params)
+    end


Could we allow different loss functions to share the same implied term? And have something like

Suggested change

for term in loss_terms(model)

issemloss(term) && update!(targets, implied(term), params)

end

implied_terms = unique([implied(term) for term in loss_terms(model)])

for term in implied_terms

update!(targets, term, params)

end

to avoid repeated updating of the same implied term?

Interesting idea! BTW, what's the use case for the terms that use overlapping input data?
What should be the syntax for specifying the shared terms?
Does it mean that the implied/observed objects should have their own IDs, and it is possible to have many-to-many relationship between implied and observed, and many-to-one between the implied and the loss terms?
I think it should be possible to implement it on top of this PR, but maybe as a part of a follow-up PR. That PR can also add test cases to make sure it works as intended.

Maximilian-Stefan-Ernst · 2026-03-29T20:41:17Z

Thank you a lot for those changes, @alyst! I have a few high level points before I review in detail:

One feature that is lost is sharing the same implied object among multiple loss functions. This could maybe be useful if multiple loss terms depend on the model-implied covariance matrix. I think with some small adaptations this could still work though (i.e. when calling update!, just updating all the unique implied terms across all loss functions), and I made a comment about it in the code.
I am not sure if I'm happy with removing update_observed completely - one of the biggest strengths of the package is fitting many models in succession to different datasets, and without update_observed your life might be quite a bit harder doing that. For example, bootstrapping for SemWLS now does not work correctly anymore, since the weight matrix V has to be updated for each new dataset.

Let me know what you think of that!

alyst · 2026-03-29T21:47:10Z

@Maximilian-Stefan-Ernst Thank you for the review! I think these are very valid points.

One feature that is lost is sharing the same implied object among multiple loss functions. This could maybe be useful if multiple loss terms depend on the model-implied covariance matrix. I think with some small adaptations this could still work though (i.e. when calling update!, just updating all the unique implied terms across all loss functions), and I made a comment about it in the code.

Internally, it is easy to implement. I'm not sure about the user-facing API. The current approach is to pass only the types of the objects and construct tehm using the keyword parameters that are broadcasted to all constructed elements of the SEM.
I think it is not ideal for maintenance and extensibility in the long term (what if some kwarg names overlap (especially for the 3rd party implied/loss objects)? I think SEM.jl is already using prefixed names to avoid such situations, but to me it signals of some design limitations).
I am thinking of some @SEM macro that would be very similar to the Sem() constructor, but will allow defining the structure of the Sem model before all the elements are created, e.g.

@SEM(
   # implied definitions
   [:implied1 => RAM(...),
   :implied2 => RAMSymbolic(...)
   ],
   # loss term definitions
   [:loss1 => SemML(:implied1, ...), # instead of passing RAM() object directly,
   :loss2 => SemFIML(:implied1, ...), # reusing the same implied object
   :loss3 => SemWLS(:implied2, ...),
   ....
   ],
)

It will expand into a code that first builds RAM objects, and then substitutes their references in the loss term construction with the actual implied objects, and finally constructs the SEM using the loss objects.
But that is a substantial update to the API.

I might have overlooked the implied objects sharing, because I am not using this feature myself (I was more focused on multi-group and regularization -- the implied objects share some parameters, but are not identical).
Is it something that is used in the paper or in the tutorials?

I am not sure if I'm happy with removing update_observed completely - one of the biggest strengths of the package is fitting many models in succession to different datasets, and without update_observed your life might be quite a bit harder doing that. For example, bootstrapping for SemWLS now does not work correctly anymore, since the weight matrix V has to be updated for each new dataset.

The replace_observed() is still there, and it is possible to implement custom replace_observed() for specific types of loss/implied objects.
The constraint is that replace_observed() produces a (shallow) copy of the original Sem object, and does not change the latent/observed variables or the set of parameters, because to me that is a big "can of worms".
Let me know if you already know what's the problem with the SemWLS update in the current PR -- I think it should be possible to fix it with the proposed replace_observed() constraints.
I guess, it is about overloading the replace_observed() for SemWLS, which will recalculate the weights matrices given the new observable.

This case actually highlights one of the issues that I wanted to address. For the bootstrap, replace_observed() should only accept the new observed data, and the elements of Sem have to figure out how to update themselves to it.
So if there are different ways to calculate weights -- the configuration has to be saved in the SemWLS object, and then weight calculation have to be replicated with the same method in the replace_observed() call.
Otherwise the correct way of calling bootstrap() or replace_observed() would depend on the particular design of the SEM model, which is not convenient for the user.

For the broader updates that change the model structure or the configuration of individual elements,
I think the user has to build the new Sem object manually -- there are just too many possibilities and corner cases there for SEM.jl to handle by a single replace/update_observed() call.

alyst · 2026-03-30T17:50:00Z

Ah, another consideration about sharing the implied term -- as we discussed RAMSymbolic has to be vech=true for SemWLS and vect = false for SemML.
So for certain configurations of implied and loss types it is not possible to share the implied types.
That probably means that there should be two independent implied objects for the shared observed data as a workaround.
I don't see an easy way to automate such workarounds, so in these situations the user has to resort to manual construction.
It should still be possible to automate the construction of shared implied objects in "normal" cases.
What's not clear to me is how critical is to implement this automation vs manual construction.

calls replace_observed() for the underlying term

the kwarg specifies whether to recalculate weights

codecov · 2026-03-31T01:12:56Z

Codecov Report

❌ Patch coverage is 78.80911% with 121 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.90%. Comparing base (1f8d2a9) to head (293c88b).
⚠️ Report is 78 commits behind head on devel.

Files with missing lines	Patch %	Lines
src/frontend/specification/Sem.jl	66.52%	80 Missing ⚠️
src/frontend/fit/standard_errors/bootstrap.jl	82.53%	11 Missing ⚠️
src/frontend/finite_diff.jl	60.00%	8 Missing ⚠️
src/objective_gradient_hessian.jl	84.61%	6 Missing ⚠️
src/frontend/pretty_printing.jl	0.00%	5 Missing ⚠️
src/frontend/fit/fitmeasures/fit_measures.jl	0.00%	3 Missing ⚠️
src/additional_functions/simulation.jl	90.90%	1 Missing ⚠️
src/additional_functions/start_val/start_fabin3.jl	80.00%	1 Missing ⚠️
src/frontend/fit/fitmeasures/CFI.jl	94.44%	1 Missing ⚠️
src/frontend/fit/fitmeasures/chi2.jl	93.33%	1 Missing ⚠️
... and 4 more

Additional details and impacted files

@@            Coverage Diff             @@
##            devel     #317      +/-   ##
==========================================
+ Coverage   71.83%   73.90%   +2.06%     
==========================================
  Files          51       57       +6     
  Lines        2223     2437     +214     
==========================================
+ Hits         1597     1801     +204     
- Misses        626      636      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alyst · 2026-03-31T04:49:03Z

@Maximilian-Stefan-Ernst I've added kwargs mechanism, and specifically recompute_observed_state kwarg (true by default) to the replace_observed() method.
It is designed to be a universal (not specific to a particular SemLoss) parameter that specifies, whether the elements of the Sem object have to update their internal states associated with the observed data to match the new observed, or they should preserve the old state associated with the original observed.
For SemWLS, it means updating the matrix weights using the new observed.
But it looks like for the bootstrap and SemWLS, it has to be recompute_observed_state=true (as it was originally) for the tests to pass.

I've also fixed the observed initialization for ensemble SEMs, that fixed the CFI failure, so now all the tests pass.

alyst changed the base branch from main to devel March 9, 2026 22:48

alyst force-pushed the refactor_sem_terms branch from 3c39941 to 32cea82 Compare March 11, 2026 20:31

Alexey Stukalov and others added 3 commits March 21, 2026 11:03

Project.toml: support Symbolics v7 & Utils v4

9c516fd

prepare_start_params(): tighten type check

6e1ffaa

SemImplied/SemLossFun: drop meanstructure kwarg

32068de

- for SemImplied require spec::SemSpec as positional - for SemLossFunction require implied argument

alyst force-pushed the refactor_sem_terms branch 2 times, most recently from eb039a2 to 88a1ff0 Compare March 23, 2026 07:33

alyst changed the title ~~Refactor Sem mult-group support~~ Refactor Sem multi-group support Mar 23, 2026

alyst marked this pull request as ready for review March 23, 2026 08:12

alyst force-pushed the refactor_sem_terms branch from 88a1ff0 to 0406f29 Compare March 23, 2026 17:51

Maximilian-Stefan-Ernst reviewed Mar 24, 2026

View reviewed changes

Maximilian-Stefan-Ernst reviewed Mar 25, 2026

View reviewed changes

src/frontend/fit/fitmeasures/chi2.jl Outdated Show resolved Hide resolved

Maximilian-Stefan-Ernst reviewed Mar 25, 2026

View reviewed changes

src/frontend/fit/fitmeasures/chi2.jl Outdated Show resolved Hide resolved

Maximilian-Stefan-Ernst reviewed Mar 25, 2026

View reviewed changes

src/loss/abstract.jl Show resolved Hide resolved

alyst and others added 10 commits March 25, 2026 17:38

refactor Sem, SemEnsemble, SemLoss

e81cec0

params/param_labels(): use both as synonyms for now

bab1317

check_same_semterm_type(): refactor check_single_lossfun()

f7f7452

update multi-group correction

961a3c8

deduplicate the correction scale methods and move to Sem.jl

replace_observed(): simplify & refactor

a9ee00b

remove update_observed!()

bootstrap: sync with Sem updates

84c6653

CFI: sync with Sem refactor

24261d5

test/build_models: remove redundant model

e4d38e5

revert using

cb9b1e7

WLS: verbose option

afac0b4

to suppress info about inv(obs_cov)

alyst force-pushed the refactor_sem_terms branch from 0406f29 to 0096211 Compare March 26, 2026 00:41

Alexey Stukalov added 2 commits March 25, 2026 17:53

docs: sync with Sem refactor

53a615a

test: fix formatting

240e3cd

Alexey Stukalov added 3 commits March 25, 2026 17:53

fit_measures(): support vectors of funcs

a277cb0

also add CFI to the list

test_fitmeasures(): refactor/simplify

60dbdc7

test/multigroup: small tweaks

05abcd9

alyst force-pushed the refactor_sem_terms branch from 0096211 to 05abcd9 Compare March 26, 2026 00:53

Maximilian-Stefan-Ernst reviewed Mar 29, 2026

View reviewed changes

Alexey Stukalov added 6 commits March 30, 2026 15:51

finite_diff: replace_observed()

91d6f47

calls replace_observed() for the underlying term

replace_observed(): support kwargs

bfd32b4

replace_observed(SemWLS, ...; update_internal_state)

690d248

the kwarg specifies whether to recalculate weights

tests/model: replace_observed() kwargs passing

b41e75b

replace_observed(...; recompute_obs_state=true)

b5e920a

tests/model: test multi-group data ctor

293c88b

alyst force-pushed the refactor_sem_terms branch from 33b9243 to 293c88b Compare March 31, 2026 00:57

		In this case, [`FiniteDiffWrapper`](@ref) method to generate a wrapper around the specific `SemLoss` term that only uses its objective
		to calculate the gradient using the finite difference approximation.

-    for term in loss_terms(model)
-        issemloss(term) && update!(targets, implied(term), params)
-    end
+    implied_terms = unique([implied(term) for term in loss_terms(model)])
+    for term in implied_terms
+        update!(targets, term, params)
+    end

Conversation

alyst commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactoring of the SEM types

Method changes

Uh oh!

Maximilian-Stefan-Ernst Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

alyst commented Mar 24, 2026

Uh oh!

Uh oh!

Maximilian-Stefan-Ernst Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

alyst Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Maximilian-Stefan-Ernst Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

alyst Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Maximilian-Stefan-Ernst Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Maximilian-Stefan-Ernst Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alyst Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Maximilian-Stefan-Ernst commented Mar 29, 2026

Uh oh!

alyst commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alyst commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 31, 2026

Codecov Report

Uh oh!

alyst commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alyst commented Mar 9, 2026 •

edited

Loading

Maximilian-Stefan-Ernst Mar 29, 2026 •

edited

Loading

alyst commented Mar 29, 2026 •

edited

Loading

alyst commented Mar 30, 2026 •

edited

Loading

alyst commented Mar 31, 2026 •

edited

Loading