Refactor Sem multi-group support#317
Refactor Sem multi-group support#317alyst wants to merge 24 commits intoStructuralEquationModels:develfrom
Conversation
3c39941 to
32cea82
Compare
- for SemImplied require spec::SemSpec as positional - for SemLossFunction require implied argument
eb039a2 to
88a1ff0
Compare
88a1ff0 to
0406f29
Compare
| In this case, [`FiniteDiffWrapper`](@ref) method to generate a wrapper around the specific `SemLoss` term that only uses its objective | ||
| to calculate the gradient using the finite difference approximation. |
There was a problem hiding this comment.
| In this case, [`FiniteDiffWrapper`](@ref) method to generate a wrapper around the specific `SemLoss` term that only uses its objective | |
| to calculate the gradient using the finite difference approximation. | |
| In this case, [`FiniteDiffWrapper`](@ref) can be used to generate a wrapper around the specific `SemLoss` term. This wrapper only uses the `LossTerm`s objective, and calculates the gradient using finite difference approximation. |
|
@Maximilian-Stefan-Ernst It might be a nice idea to use copilot for catching typos, incorrect sentences, but also potential bugs. |
src/frontend/fit/fitmeasures/chi2.jl
Outdated
| ############################################################################################ | ||
| function χ²(fit::SemFit, model::AbstractSem) | ||
| terms = sem_terms(model) | ||
| isempty(terms) && return 0.0 |
There was a problem hiding this comment.
Maybe we should throw an error for a Sem with no terms?
There was a problem hiding this comment.
I think the Sem constructor should throw an exception if there are no SEM terms.
Returning 0 here seems legit if there is no data, but we can also change it into @assert.
Throwing an exception in situations like this will just add a lot of redundant code.
| # FIXME remove this implicit logic | ||
| # SemWLS only accepts vech-ed implied covariance | ||
| if isa(loss, Type) && (loss <: SemWLS) && !haskey(kwargs, :vech) | ||
| implied_kwargs = copy(kwargs) | ||
| implied_kwargs[:vech] = true | ||
| else | ||
| implied_kwargs = kwargs | ||
| end |
There was a problem hiding this comment.
I think before this was handled inside RAMSymbolic - is there a reason to move it here?
There was a problem hiding this comment.
My original idea was to move away from passing the kwargs... around.
It causes various issues, adds maintenance overhead, and interferes with the modularity/extensibility goal.
Instead, the constructors should support their specific keywords only.
This PR still retains kwargs... passthrough for compatibility, but moves all "parameter compatibility" logic to the Sem construction.
With the old logic the user only specifies the implied and loss types, and SEM.jl figures how to tweak their parameters to make them compatible.
RAMSymbolic/SemWLS is one such example, where RAMSymbolic decides how to construct itself knowing how it will be used.
In this PR, the implied constructors are agnostic of the loss functions they will be used in.
The loss constructor will throw an exception if the implied object is not compatible.
In principle the logic here could be removed -- the user should be able to easily fix the SEM construction based on the error message thrown by SemWLS.
There was a problem hiding this comment.
That sounds reasonable - I think the error message in SemWLS is good, and we can remove the logic here.
deduplicate the correction scale methods and move to Sem.jl
remove update_observed!()
to suppress info about inv(obs_cov)
0406f29 to
0096211
Compare
0096211 to
05abcd9
Compare
| for term in loss_terms(model) | ||
| issemloss(term) && update!(targets, implied(term), params) | ||
| end |
There was a problem hiding this comment.
Could we allow different loss functions to share the same implied term? And have something like
| for term in loss_terms(model) | |
| issemloss(term) && update!(targets, implied(term), params) | |
| end | |
| implied_terms = unique([implied(term) for term in loss_terms(model)]) | |
| for term in implied_terms | |
| update!(targets, term, params) | |
| end |
to avoid repeated updating of the same implied term?
There was a problem hiding this comment.
Interesting idea! BTW, what's the use case for the terms that use overlapping input data?
What should be the syntax for specifying the shared terms?
Does it mean that the implied/observed objects should have their own IDs, and it is possible to have many-to-many relationship between implied and observed, and many-to-one between the implied and the loss terms?
I think it should be possible to implement it on top of this PR, but maybe as a part of a follow-up PR. That PR can also add test cases to make sure it works as intended.
|
Thank you a lot for those changes, @alyst! I have a few high level points before I review in detail:
Let me know what you think of that! |
|
@Maximilian-Stefan-Ernst Thank you for the review! I think these are very valid points.
Internally, it is easy to implement. I'm not sure about the user-facing API. The current approach is to pass only the types of the objects and construct tehm using the keyword parameters that are broadcasted to all constructed elements of the SEM. @SEM(
# implied definitions
[:implied1 => RAM(...),
:implied2 => RAMSymbolic(...)
],
# loss term definitions
[:loss1 => SemML(:implied1, ...), # instead of passing RAM() object directly,
:loss2 => SemFIML(:implied1, ...), # reusing the same implied object
:loss3 => SemWLS(:implied2, ...),
....
],
)It will expand into a code that first builds RAM objects, and then substitutes their references in the loss term construction with the actual implied objects, and finally constructs the SEM using the loss objects. I might have overlooked the implied objects sharing, because I am not using this feature myself (I was more focused on multi-group and regularization -- the implied objects share some parameters, but are not identical).
The This case actually highlights one of the issues that I wanted to address. For the bootstrap, For the broader updates that change the model structure or the configuration of individual elements, |
|
Ah, another consideration about sharing the implied term -- as we discussed RAMSymbolic has to be |
calls replace_observed() for the underlying term
the kwarg specifies whether to recalculate weights
33b9243 to
293c88b
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## devel #317 +/- ##
==========================================
+ Coverage 71.83% 73.90% +2.06%
==========================================
Files 51 57 +6
Lines 2223 2437 +214
==========================================
+ Hits 1597 1801 +204
- Misses 626 636 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@Maximilian-Stefan-Ernst I've added kwargs mechanism, and specifically I've also fixed the observed initialization for ensemble SEMs, that fixed the CFI failure, so now all the tests pass. |
This is a largest remaining part of #193, which changes some interfaces.
Refactoring of the SEM types
AbstractLossis the base type for all functionsSemLoss{O,I} <: AbstractLossis the base type for all SEM losses, it now requires to have observed::O and implied::I fieldSemLossctor should always be given observed and implied (positional),meanstructurekeyword is gone -- loss should always respect implied specification.LossTermis a thin wrapper aroundAbstractLossthat adds optional id of the loss term and optional weightSemis a container ofLossTermobjects (accessible vialoss_terms(sem), orloss_term(sem, id)), so it can handle multiple SEM terms (accessible viasem_terms(sem)-- subset ofloss_terms(sem), orsem_term(sem, id)).It replaces both the old
SemandSemEnsemble.AbstractSingleSem,AbstractSemCollectionandSemEnsembleare gone.Method changes
Multi-term SEMs could be created like
Or with weights specification
The new Sem() and loss-term constructors rely less on keyword arguments and more on positional arguments, but some keywords support is present.
update_observed!()was removed. It was only used byreplace_observed(),but otherwise in-place model modification with unclear semantics is error-prone.
replace_observed(sem, data)was simplified by removing support of additional keywords or requirement to pass SEM specification.It only creates a copy of the given
Semwith the observed data replaced,but implied and loss definitions intact.
Changing observed vars is not supported -- that is something use-case specific
that user should implement in their code.
check_single_lossfun()was renamed intocheck_same_semterm_type()asit better describes what it does. If check is successful, it returns the specific
subtype of
SemLoss.bootstrap()andse_bootstrap()usebootstrap!(acc::BootstrapAccumulator, ...)function to reduce code duplication
bootstrap()returnsBootstrapResult{T}for better type inferencefit_measures()now also accepts vector of functions, and includesCFIby default (DEFAULT_FIT_MEASURESconstant)test_fitmeasures()was tweaked to handle more repetitive code: calculating the subset of fit measures, and compairing this subset against lavaan refs, checking for measures that could not be applied to given loss types (SemWLS).