Conversation
wildart
left a comment
There was a problem hiding this comment.
Great. Now we need to shape it into an acceptable API style.
src/MultivariateStats.jl
Outdated
| using LinearAlgebra | ||
| using SparseArrays | ||
| using Statistics: middle | ||
| using StatsAPI: RegressionModel | ||
| using StatsBase: | ||
| SimpleCovariance, | ||
| CovarianceEstimator, | ||
| AbstractDataTransform, | ||
| ConvergenceException, | ||
| pairwise, | ||
| pairwise!, | ||
| CoefTable | ||
|
|
||
| import Statistics: mean, var, cov, covm, cor | ||
| import Base: length, size, show | ||
| import StatsAPI: fit, predict, coef, weights, dof, r2 | ||
| import LinearAlgebra: eigvals, eigvecs | ||
|
|
||
| export |
There was a problem hiding this comment.
Was it automatic styling? Why the format change?
|
|
||
| # whiten | ||
| Whitening, # Type: Whitening transformation | ||
|
|
There was a problem hiding this comment.
These new lines helped to separate (visually) various algorithms & their methods.
src/mca.jl
Outdated
| """ | ||
| ca | ||
|
|
||
| Fit a correspondence analysis using the data array `X` whose rows are | ||
| the objects and columns are the variables. The first `d` components | ||
| are retained. | ||
| """ | ||
| function ca(X, d) | ||
| c = fit(CA, X, d) | ||
| return c | ||
| end |
There was a problem hiding this comment.
There is no need for this high level function, standard API presume fit call for construction a model.
src/mca.jl
Outdated
| ca.G = Wc \ Q * Dq | ||
|
|
||
| # Get the eigenvalues | ||
| ca.I = D .^ 2 |
There was a problem hiding this comment.
fit call should always return model
src/mca.jl
Outdated
| function CA(X) | ||
|
|
||
| # Convert to proportions | ||
| X = X ./ sum(X) | ||
|
|
||
| # Calculate row and column means | ||
| r = sum(X, dims = 2)[:] | ||
| c = sum(X, dims = 1)[:] | ||
|
|
||
| # Center the data matrix to create residuals | ||
| R = X - r * c' | ||
|
|
||
| # Standardize the data matrix to create standardized residuals | ||
| Wr = Diagonal(sqrt.(r)) | ||
| Wc = Diagonal(sqrt.(c)) | ||
| SR = Wr \ R / Wc | ||
|
|
||
| T = eltype(X) | ||
| return CA(X, R, r, c, SR, zeros(T, 0, 0), zeros(T, 0, 0), zeros(T, 0)) | ||
| end |
There was a problem hiding this comment.
The algorithm code should be in the fit function. There is no need for special constructor of the model.
src/mca.jl
Outdated
|
|
||
| # Recoding dictionary, maps each distinct value in z to | ||
| # an offset | ||
| rd = Dict{eltype(z),Int}() |
There was a problem hiding this comment.
Use function parametrized types for arguments.
src/mca.jl
Outdated
| end | ||
|
|
||
| # Reverse the recoding dictionary | ||
| rdi = Dict{Int,eltype(z)}() |
src/mca.jl
Outdated
| # to levels for each variable. | ||
| function make_indicators(Z) | ||
|
|
||
| rd, rdr = Dict[], Dict[] |
There was a problem hiding this comment.
You need to specify types for these dicts.
test/mca.jl
Outdated
| :V3 => ["D", "D", "D", "C", "D", "C", "D", "C"], | ||
| ) | ||
|
|
||
| m = mca(da, 3; vnames = names(da)) |
There was a problem hiding this comment.
Use standard API for construction of the model - fit call.
| function inertia(mca::MCA) | ||
| inr = ( | ||
| Raw = mca.C.I, | ||
| Unadjusted = mca.unadjusted_eigs, | ||
| Benzecri = mca.benzecri_eigs, | ||
| Greenacre = mca.greenacre_eigs, | ||
| ) | ||
| return inr | ||
| end |
There was a problem hiding this comment.
You may want to define multiple getter function for various inertia types.
|
Thanks, I have resolved most of these. The CI fails are due to importation of DataFrames. If you don't want a DataFrames dependency we can probably work around it, otherwise I can update the project file to include this dependency. |
pick f57ef58 check for repeated non-zero signular values
This is an implementation of Correspondence Analysis and Multiple Correspondence Analysis. I think it would be useful for this to be included in the MultivariateStats package.