diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 959ac37..df2d722 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.1","generation_timestamp":"2024-02-21T02:17:12","documenter_version":"1.2.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.1","generation_timestamp":"2024-02-21T02:24:42","documenter_version":"1.2.1"}} \ No newline at end of file diff --git a/dev/details/index.html b/dev/details/index.html index 0731d1f..ff912c6 100644 --- a/dev/details/index.html +++ b/dev/details/index.html @@ -4,4 +4,4 @@ \mathrm{min}\quad & \frac{1}{2} \Vert G - X \Vert^2 \\ \mathrm{s.t.}\quad & X_{ii} = 1, \quad i = 1, \ldots , n, \\ & X \in S_{+}^{n} -\end{aligned}\]

Normal to Anything (NORTA)

Given:

Do:

+\end{aligned}\]

Normal to Anything (NORTA)

Given:

Do:

diff --git a/dev/function_index/index.html b/dev/function_index/index.html index 9d63f89..43c8f82 100644 --- a/dev/function_index/index.html +++ b/dev/function_index/index.html @@ -1,2 +1,2 @@ -Index · Bigsimr.jl

Index

+Index · Bigsimr.jl

Index

diff --git a/dev/getting_started/15ad7dd8.svg b/dev/getting_started/15ad7dd8.svg new file mode 100644 index 0000000..4ef50f4 --- /dev/null +++ b/dev/getting_started/15ad7dd8.svg @@ -0,0 +1,1475 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/getting_started/29851180.svg b/dev/getting_started/29851180.svg new file mode 100644 index 0000000..22f9444 --- /dev/null +++ b/dev/getting_started/29851180.svg @@ -0,0 +1,1522 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/getting_started/413288e4.svg b/dev/getting_started/413288e4.svg deleted file mode 100644 index 11d6e33..0000000 --- a/dev/getting_started/413288e4.svg +++ /dev/null @@ -1,1385 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/dev/getting_started/cc250c52.svg b/dev/getting_started/cc250c52.svg deleted file mode 100644 index 00a0cd1..0000000 --- a/dev/getting_started/cc250c52.svg +++ /dev/null @@ -1,2239 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/dev/getting_started/79aecc74.svg b/dev/getting_started/e08869a4.svg similarity index 77% rename from dev/getting_started/79aecc74.svg rename to dev/getting_started/e08869a4.svg index 88ac038..713d865 100644 --- a/dev/getting_started/79aecc74.svg +++ b/dev/getting_started/e08869a4.svg @@ -1,334 +1,334 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/getting_started/index.html b/dev/getting_started/index.html index d4483c8..93513eb 100644 --- a/dev/getting_started/index.html +++ b/dev/getting_started/index.html @@ -10,47 +10,47 @@ | 114 | 14 | 75 | | 115 | 18 | 76 | | 116 | 20 | 68 | - 110 rows omitted

Let’s look at the joint distribution of the Ozone and Temperature

Example block output

We can see that not all margins are normally distributed; the ozone level is highly skewed. Though we don’t know the true distribution of ozone levels, we can go forward assuming that it is log-normally distributed.

To simulate observations from this joint distribution, we need to estimate the correlation and the marginal parameters.

Estimating Correlation

To estimate the correlation, we use cor with an argument specifying the type of correlation to estimate. The options are Pearson, Spearman, or Kendall.

julia> ρ = cor(Matrix(df), Pearson)2×2 Matrix{Float64}:
+       110 rows omitted

Let’s look at the joint distribution of the Ozone and Temperature

Example block output

We can see that not all margins are normally distributed; the ozone level is highly skewed. Though we don’t know the true distribution of ozone levels, we can go forward assuming that it is log-normally distributed.

To simulate observations from this joint distribution, we need to estimate the correlation and the marginal parameters.

Estimating Correlation

To estimate the correlation, we use cor with an argument specifying the type of correlation to estimate. The options are Pearson, Spearman, or Kendall.

julia> ρ = cor(Matrix(df), Pearson)2×2 Matrix{Float64}:
  1.0      0.69836
  0.69836  1.0

Defining Marginal Distributions

Next we can estimate the marginal parameters. Assuming that the Temperature is normally distributed, it has parameters:

julia> μ_Temp = mean(df.Temp)77.87068965517241
julia> σ_Temp = std(df.Temp)9.48548563759966

and assuming that Ozone is log-normally distributed, it has parameters:

julia> μ_Ozone = mean(log.(df.Ozone))3.418515100812007
julia> σ_Ozone = sqrt(mean((log.(df.Ozone) .- mean(log.(df.Ozone))).^2))0.8617359690270703

Finally we take the parameters and put them into a vector of margins:

julia> margins = [Normal(μ_Temp, σ_Temp), LogNormal(μ_Ozone, σ_Ozone)]2-element Vector{Distribution{Univariate, Continuous}}:
  Normal{Float64}(μ=77.87068965517241, σ=9.48548563759966)
  LogNormal{Float64}(μ=3.418515100812007, σ=0.8617359690270703)

Correlation Bounds

Given a vector of margins, the theoretical lower and upper correlation coefficients can be estimated using simulation:

julia> lower, upper = cor_bounds(margins, Pearson);
julia> lower2×2 Matrix{Float64}: - 1.0 -0.807527 - -0.807527 1.0
julia> upper2×2 Matrix{Float64}: - 1.0 0.808883 - 0.808883 1.0

The pearson_bounds function uses more sophisticated methods to determine the theoretical lower and upper Pearson correlation bounds. It also requires more computational time.

julia> lower, upper = pearson_bounds(margins);
julia> lower2×2 Matrix{Float64}: + 1.0 -0.827695 + -0.827695 1.0
julia> upper2×2 Matrix{Float64}: + 1.0 0.826898 + 0.826898 1.0

The pearson_bounds function uses more sophisticated methods to determine the theoretical lower and upper Pearson correlation bounds. It also requires more computational time.

julia> lower, upper = pearson_bounds(margins);
julia> lower2×2 Matrix{Float64}: 1.0 -0.821122 -0.821122 1.0
julia> upper2×2 Matrix{Float64}: 1.0 0.821122 0.821122 1.0

Simulating Multivariate Data

Let’s now simulate 10,000 observations from the joint distribution using rvec:

julia> x = rvec(10_000, ρ, margins)10000×2 Matrix{Float64}:
- 73.5149   33.4932
- 87.8056   28.5674
- 68.7403   12.457
- 61.3621    7.22837
- 74.7645   70.4748
- 75.4706   20.6238
- 69.9828   10.0259
- 56.979     6.31042
- 82.285    45.5811
- 75.3011   37.3477
+ 87.4109   22.3591
+ 75.7104   19.7585
+ 61.4875    7.91311
+ 87.2304   93.3604
+ 80.5391   20.9102
+ 74.0107   42.8532
+ 92.2595   76.5276
+ 59.7821   20.2999
+ 73.4545    4.69385
+ 77.3004   18.7691
   ⋮
- 74.3841   33.5517
- 76.7599   29.6025
- 93.9216  136.805
- 87.9215   91.6839
- 84.193    56.5316
- 71.6759   21.7275
- 58.5467    8.13902
- 80.2676   41.4696
- 69.3834   10.5428

Visualizing Bivariate Data

df_sim = DataFrame(x, [:Temp, :Ozone]);
+ 82.8761   34.0147
+ 75.0262   41.8732
+ 67.5626   53.1339
+ 64.4427    7.12912
+ 81.6657   47.3555
+ 63.7596    7.44514
+ 88.6181   44.4193
+ 83.7085  134.4
+ 71.7335   35.387

Visualizing Bivariate Data

df_sim = DataFrame(x, [:Temp, :Ozone]);
 
 histogram2d(df_sim.:Temp, df_sim.:Ozone, nbins=250, legend=false,
 			xlims=extrema(df.:Temp) .+ (-10, 10),
-			ylims=extrema(df.:Ozone) .+ (0, 20))
Example block output

Compared to Uncorrelated Samples

We can compare the bivariate distribution above to one where no correlation is taken into account.

df_sim2 = DataFrame(
+			ylims=extrema(df.:Ozone) .+ (0, 20))
Example block output

Compared to Uncorrelated Samples

We can compare the bivariate distribution above to one where no correlation is taken into account.

df_sim2 = DataFrame(
 	Temp  = rand(margins[1], 10000),
 	Ozone = rand(margins[2], 10000)
 );
 
 histogram2d(df_sim2.:Temp, df_sim2.:Ozone, nbins=250, legend=false,
 			xlims=extrema(df.:Temp) .+ (-10, 10),
-			ylims=extrema(df.:Ozone) .+ (0, 20))
Example block output + ylims=extrema(df.:Ozone) .+ (0, 20))Example block output diff --git a/dev/index.html b/dev/index.html index be47b46..5c587c5 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Bigsimr.jl · Bigsimr.jl

Bigsimr.jl Package

The Bigsimr package provides a helpful set of tools for simulating high-dimensional multivariate data with arbitrary marginal distributions. Particularly, Bigsimr implements:

  • Simulation of multivariate data via Gaussian copulas (NORTA algorithm)
  • Converting between different types of correlations (Pearson, Spearman, and Kendall)
  • Computing the nearest positive definite correlation matrix (quadratically convergent algorithm)
  • Pearson correlation matching to account for non-linear transformations
  • Generating random positive semi-definite correlation matrices
+Bigsimr.jl · Bigsimr.jl

Bigsimr.jl Package

The Bigsimr package provides a helpful set of tools for simulating high-dimensional multivariate data with arbitrary marginal distributions. Particularly, Bigsimr implements:

  • Simulation of multivariate data via Gaussian copulas (NORTA algorithm)
  • Converting between different types of correlations (Pearson, Spearman, and Kendall)
  • Computing the nearest positive definite correlation matrix (quadratically convergent algorithm)
  • Pearson correlation matching to account for non-linear transformations
  • Generating random positive semi-definite correlation matrices
diff --git a/dev/main_functions/index.html b/dev/main_functions/index.html index 2cdeb1f..2201497 100644 --- a/dev/main_functions/index.html +++ b/dev/main_functions/index.html @@ -20,7 +20,7 @@ 1.82689 58.417 0.580125 4.73678 4.75506 11.2741 1.92511 9.44913 0.651013 - 3.19883 39.3707 0.581781source
Bigsimr.rmvnFunction
rmvn(n::Int[, μ::AbstractVector{<:Real}], Σ::AbstractMatrix{<:Real})

Utilizes available threads for fast generation of multivariate normal samples.

Examples

julia> μ = [-3, 1, 10];
+ 3.19883  39.3707    0.581781
source
Bigsimr.rmvnFunction
rmvn(n::Int[, μ::AbstractVector{<:Real}], Σ::AbstractMatrix{<:Real})

Utilizes available threads for fast generation of multivariate normal samples.

Examples

julia> μ = [-3, 1, 10];
 
 julia> S = cor_randPD(3)
 3×3 Matrix{Float64}:
@@ -39,7 +39,7 @@
  -3.00716    0.00897664  10.1173
  -3.00928    0.851214     9.74029
  -3.43021    0.402382     9.51274
- -1.77849    0.157933     9.15944
source

Correlation Types

Correlation Computation

Statistics.corFunction
cor(x[, y], ::CorType)

Compute the correlation matrix of a given type.

The possible correlation types are:

Examples

julia> x = [-1.62169     0.0158613   0.500375  -0.794381
+ -1.77849    0.157933     9.15944
source

Correlation Types

Correlation Computation

Statistics.corFunction
cor(x[, y], ::CorType)

Compute the correlation matrix of a given type.

The possible correlation types are:

Examples

julia> x = [-1.62169     0.0158613   0.500375  -0.794381
              2.50689     3.31666    -1.3049     2.16058
              0.495674    0.348621   -0.614451  -0.193579
              2.32149     2.18847    -1.83165    2.08399
@@ -69,7 +69,7 @@
   1.0        0.733333  -0.688889   0.555556
   0.733333   1.0       -0.688889   0.555556
  -0.688889  -0.688889   1.0       -0.422222
-  0.555556   0.555556  -0.422222   1.0
source
Bigsimr.cor_fastFunction
cor_fast(X::AbstractMatrix{<:Real}, C::CorType=Pearson)

Calculate the correlation matrix in parallel using available threads.

source
Bigsimr.cor_boundsFunction
cor_bounds(d1::UnivariateDistribution, d2::UnivariateDistribution, cortype::CorType, samples::Int)

Compute the stochastic lower and upper correlation bounds between two marginal distributions.

This method relies on sampling from each distribution and then estimating the specified correlation between the sorted samples. Because the samples are random, there will be some variation in the answer for each call to cor_bounds. Increasing the number of samples will increase the accuracy of the estimate, but will also take longer to sort. Therefore ≈100,000 samples (the default) are recommended so that it runs fast while still returning a good estimate.

The possible correlation types are:

Examples

julia> using Distributions
+  0.555556   0.555556  -0.422222   1.0
source
Bigsimr.cor_fastFunction
cor_fast(X::AbstractMatrix{<:Real}, C::CorType=Pearson)

Calculate the correlation matrix in parallel using available threads.

source
Bigsimr.cor_boundsFunction
cor_bounds(d1::UnivariateDistribution, d2::UnivariateDistribution, cortype::CorType, samples::Int)

Compute the stochastic lower and upper correlation bounds between two marginal distributions.

This method relies on sampling from each distribution and then estimating the specified correlation between the sorted samples. Because the samples are random, there will be some variation in the answer for each call to cor_bounds. Increasing the number of samples will increase the accuracy of the estimate, but will also take longer to sort. Therefore ≈100,000 samples (the default) are recommended so that it runs fast while still returning a good estimate.

The possible correlation types are:

Examples

julia> using Distributions
 
 julia> A = Normal(78, 10); B = LogNormal(3, 1);
 
@@ -83,7 +83,7 @@
 (lower = -0.7631871539735006, upper = 0.7624398609255689)
 
 julia> cor_bounds(A, B, Spearman, 250_000)
-(lower = -1.0, upper = 1.0)
source
cor_bounds(margins::AbstractVector{<:UnivariateDistribution}, cortype::CorType, samples::Int)

Compute the stochastic pairwise lower and upper correlation bounds between a set of marginal distributions.

The possible correlation types are:

Examples

julia> using Distributions
+(lower = -1.0, upper = 1.0)
source
cor_bounds(margins::AbstractVector{<:UnivariateDistribution}, cortype::CorType, samples::Int)

Compute the stochastic pairwise lower and upper correlation bounds between a set of marginal distributions.

The possible correlation types are:

Examples

julia> using Distributions
 
 julia> margins = [Normal(78, 10), Binomial(20, 0.2), LogNormal(3, 1)];
 
@@ -99,7 +99,7 @@
 3×3 Matrix{Float64}:
  1.0       0.982471  0.766727
  0.982471  1.0       0.798379
- 0.766727  0.798379  1.0
source

Random Correlation Matrix

Bigsimr.cor_randPSDFunction
cor_randPSD(T::Type{<:Real}, d::Int, k::Int=d-1)

Return a random positive semidefinite correlation matrix where d is the dimension ($d ≥ 1$) and k is the number of factor loadings ($1 ≤ k < d$).

See also: cor_randPD

Examples

julia> cor_randPSD(Float64, 4, 2)
+ 0.766727  0.798379  1.0
source

Random Correlation Matrix

Bigsimr.cor_randPSDFunction
cor_randPSD(T::Type{<:Real}, d::Int, k::Int=d-1)

Return a random positive semidefinite correlation matrix where d is the dimension ($d ≥ 1$) and k is the number of factor loadings ($1 ≤ k < d$).

See also: cor_randPD

Examples

julia> cor_randPSD(Float64, 4, 2)
 4×4 Matrix{Float64}:
  1.0        0.276386   0.572837   0.192875
  0.276386   1.0        0.493806  -0.352386
@@ -118,7 +118,7 @@
   1.0        0.81691   -0.27188    0.289011
   0.81691    1.0       -0.44968    0.190938
  -0.27188   -0.44968    1.0       -0.102597
-  0.289011   0.190938  -0.102597   1.0
source
Bigsimr.cor_randPDFunction
cor_randPD(T::Type{<:Real}, d::Int, k::Int=d-1)

The same as cor_randPSD, but calls cor_fastPD to ensure that the returned matrix is positive definite.

Examples

julia> cor_randPD(Float64, 4, 2)
+  0.289011   0.190938  -0.102597   1.0
source
Bigsimr.cor_randPDFunction
cor_randPD(T::Type{<:Real}, d::Int, k::Int=d-1)

The same as cor_randPSD, but calls cor_fastPD to ensure that the returned matrix is positive definite.

Examples

julia> cor_randPD(Float64, 4, 2)
 4×4 Matrix{Float64}:
   1.0        0.458549  -0.33164    0.492572
   0.458549   1.0       -0.280873   0.62544
@@ -137,7 +137,7 @@
   1.0        0.356488   0.701521  -0.251671
   0.356488   1.0        0.382787  -0.117748
   0.701521   0.382787   1.0       -0.424952
- -0.251671  -0.117748  -0.424952   1.0
source

Converting Correlation Types

Bigsimr.cor_convertFunction
cor_convert(X, from, to)

Convert from one type of correlation matrix to another.

The role of conversion in this package is typically used from either Spearman or Kendall to Pearson where the Pearson correlation is used in the generation of random multivariate normal samples. After converting, the correlation matrix may not be positive semidefinite, so it is recommended to check using LinearAlgebra.isposdef, and subsequently calling cor_nearPD.

See also: cor_nearPD, cor_fastPD

The possible correlation types are:

Examples

julia> r = [ 1.0       -0.634114   0.551645   0.548993
+ -0.251671  -0.117748  -0.424952   1.0
source

Converting Correlation Types

Bigsimr.cor_convertFunction
cor_convert(X, from, to)

Convert from one type of correlation matrix to another.

The role of conversion in this package is typically used from either Spearman or Kendall to Pearson where the Pearson correlation is used in the generation of random multivariate normal samples. After converting, the correlation matrix may not be positive semidefinite, so it is recommended to check using LinearAlgebra.isposdef, and subsequently calling cor_nearPD.

See also: cor_nearPD, cor_fastPD

The possible correlation types are:

Examples

julia> r = [ 1.0       -0.634114   0.551645   0.548993
             -0.634114   1.0       -0.332105  -0.772114
              0.551645  -0.332105   1.0        0.143949
              0.548993  -0.772114   0.143949   1.0];
@@ -157,7 +157,7 @@
   0.383807  -0.576435   0.0962413   1.0
 
 julia> r == cor_convert(r, Pearson, Pearson)
-true
source

Correlation Utils

Convert a correlation matrix using other utilities.

Bigsimr.cor_constrainFunction
cor_constrain(X::AbstractMatrix{<:Real})

Constrain a matrix so that its diagonal elements are 1, off-diagonal elements are bounded between -1 and 1, and a symmetric view of the upper triangle is made.

See also: cor_constrain!

Examples

julia> a = [ 0.802271   0.149801  -1.1072     1.13451
+true
source

Correlation Utils

Convert a correlation matrix using other utilities.

Bigsimr.cor_constrainFunction
cor_constrain(X::AbstractMatrix{<:Real})

Constrain a matrix so that its diagonal elements are 1, off-diagonal elements are bounded between -1 and 1, and a symmetric view of the upper triangle is made.

See also: cor_constrain!

Examples

julia> a = [ 0.802271   0.149801  -1.1072     1.13451
              0.869788  -0.824395   0.38965    0.965936
             -1.45353   -1.29282    0.417233  -0.362526
              0.638291  -0.682503   1.12092   -1.27018];
@@ -167,7 +167,7 @@
   1.0       0.149801  -1.0        1.0
   0.149801  1.0        0.38965    0.965936
  -1.0       0.38965    1.0       -0.362526
-  1.0       0.965936  -0.362526   1.0
source
Bigsimr.cov2corFunction
cov2cor(X::AbstractMatrix{<:Real})

Transform a covariance matrix into a correlation matrix.

Details

If $X \in \mathbb{R}^{n \times n}$ is a covariance matrix, then

\[\tilde{X} = D^{-1/2} X D^{-1/2}, \quad D = \mathrm{diag(X)}\]

is the associated correlation matrix.

source
Bigsimr.cov2cor!Function
cov2cor!(X::AbstractMatrix{<:Real})

Same as cov2cor, except that the matrix is updated in place to save memory.

source
Bigsimr.is_correlationFunction
is_correlation(X)

Check if the given matrix passes all the checks required to be a valid correlation matrix.

Criteria

A matrix is a valid correlation matrix if:

  • Square
  • Symmetric
  • Diagonal elements are equal to 1
  • Off diagonal elements are between -1 and 1
  • Is positive definite

Examples

julia> x = rand(3, 3)
+  1.0       0.965936  -0.362526   1.0
source
Bigsimr.cov2corFunction
cov2cor(X::AbstractMatrix{<:Real})

Transform a covariance matrix into a correlation matrix.

Details

If $X \in \mathbb{R}^{n \times n}$ is a covariance matrix, then

\[\tilde{X} = D^{-1/2} X D^{-1/2}, \quad D = \mathrm{diag(X)}\]

is the associated correlation matrix.

source
Bigsimr.cov2cor!Function
cov2cor!(X::AbstractMatrix{<:Real})

Same as cov2cor, except that the matrix is updated in place to save memory.

source
Bigsimr.is_correlationFunction
is_correlation(X)

Check if the given matrix passes all the checks required to be a valid correlation matrix.

Criteria

A matrix is a valid correlation matrix if:

  • Square
  • Symmetric
  • Diagonal elements are equal to 1
  • Off diagonal elements are between -1 and 1
  • Is positive definite

Examples

julia> x = rand(3, 3)
 3×3 Matrix{Float64}:
  0.834446  0.183285  0.837872
  0.637295  0.270709  0.458703
@@ -193,7 +193,7 @@
 ];
 
 julia> is_correlation(r_negdef)
-false
source

Nearest Correlation Matrix

Provided by NearestCorrelationMatrix.jl

NearestCorrelationMatrix.NewtonType
Newton(; kwargs...)

Parameters

  • τ: a tuning parameter controlling the smallest eigenvalue of the resulting matrix
  • tol: the tolerance used as a stopping condition during iterations
  • tol_cg: the tolerance used in the conjugate gradient method
  • tol_ls: the tolerance used in the line search method
  • iter_outer: the max number of Newton steps
  • iter_inner: the max number of refinements during the Newton step
  • iter_cg: the max number of iterations in the conjugate gradient method
source
NearestCorrelationMatrix.DirectProjectionType
DirectProjection(; τ=1e-6)

Single step projection of the input matrix into the set of correlation matrices. Useful when a "close" correlation matrix is needed without concern for it being the most optimal.

Parameters

  • τ: a tuning parameter controlling the smallest eigenvalue of the resulting matrix
source
NearestCorrelationMatrix.nearest_corFunction
nearest_cor(R::AbstractMatrix{<:AbstractFloat}, alg::NearestCorrelationAlgorithm)

Return the nearest positive definite correlation matrix to R.

Examples

julia> import LinearAlgebra: isposdef
+false
source

Nearest Correlation Matrix

Provided by NearestCorrelationMatrix.jl

NearestCorrelationMatrix.NewtonType
Newton(; kwargs...)

Parameters

  • τ: a tuning parameter controlling the smallest eigenvalue of the resulting matrix
  • tol: the tolerance used as a stopping condition during iterations
  • tol_cg: the tolerance used in the conjugate gradient method
  • tol_ls: the tolerance used in the line search method
  • iter_outer: the max number of Newton steps
  • iter_inner: the max number of refinements during the Newton step
  • iter_cg: the max number of iterations in the conjugate gradient method
source
NearestCorrelationMatrix.DirectProjectionType
DirectProjection(; τ=1e-6)

Single step projection of the input matrix into the set of correlation matrices. Useful when a "close" correlation matrix is needed without concern for it being the most optimal.

Parameters

  • τ: a tuning parameter controlling the smallest eigenvalue of the resulting matrix
source
NearestCorrelationMatrix.nearest_corFunction
nearest_cor(R::AbstractMatrix{<:AbstractFloat}, alg::NearestCorrelationAlgorithm)

Return the nearest positive definite correlation matrix to R.

Examples

julia> import LinearAlgebra: isposdef
 
 julia> r = [
     1.00 0.82 0.56 0.44
@@ -233,4 +233,4 @@
  0.440514  0.847352  0.219582  1.0
 
 julia> isposdef(r)
-true
source

Simplified Wrappers

Bigsimr.cor_fastPDFunction
cor_fastPD(X::AbstractMatrix{<:Real}, tau=1e-6)

Return a positive definite correlation matrix that is close to X. tau is a tuning parameter that controls the minimum eigenvalue of the resulting matrix. τ can be set to zero if only a positive semidefinite matrix is needed.

See also: cor_nearPD, cor_nearPSD

source

Pearson Correlation Matching

PearsonCorrelationMatch.pearson_matchFunction
pearson_match(p::Real, d1::UnivariateDistribution, d2::UnivariateDistribution, n=7)

Compute the Pearson correlation coefficient to be used in a bivariate Gaussian copula.

source
pearson_match(rho, margins, n=21)

Pairwise compute the Pearson correlation coefficient to be used in a bivariate Gaussian copula. Ensures that the resulting matrix is a valid correlation matrix.

source
+truesource

Simplified Wrappers

Bigsimr.cor_fastPDFunction
cor_fastPD(X::AbstractMatrix{<:Real}, tau=1e-6)

Return a positive definite correlation matrix that is close to X. tau is a tuning parameter that controls the minimum eigenvalue of the resulting matrix. τ can be set to zero if only a positive semidefinite matrix is needed.

See also: cor_nearPD, cor_nearPSD

source

Pearson Correlation Matching

PearsonCorrelationMatch.pearson_matchFunction
pearson_match(p::Real, d1::UnivariateDistribution, d2::UnivariateDistribution, n=7)

Compute the Pearson correlation coefficient to be used in a bivariate Gaussian copula.

source
pearson_match(rho, margins, n=21)

Pairwise compute the Pearson correlation coefficient to be used in a bivariate Gaussian copula. Ensures that the resulting matrix is a valid correlation matrix.

source
diff --git a/dev/nearest_correlation_matrix/index.html b/dev/nearest_correlation_matrix/index.html index f764432..9a1934f 100644 --- a/dev/nearest_correlation_matrix/index.html +++ b/dev/nearest_correlation_matrix/index.html @@ -20,16 +20,16 @@ 687263.0 163202.0 620491.0 … 34539.0 20698.0 3417.0 946835.0 157578.0 287362.0 33074.0 69588.0 64613.0 499893.0 238058.0 395895.0 33389.0 26519.0 37366.0
julia> spearman_corr = cor(m, Spearman);
julia> pearson_corr = cor_convert(spearman_corr, Spearman, Pearson);
julia> isposdef(spearman_corr), isposdef(pearson_corr)(true, false)

We see that the converted Pearson correlation matrix is no longer positve definite. This will result in a failure during the multivariate normal generation, particularly during the Cholesky decomposition.

julia> rvec(10, pearson_corr, margins)ERROR: ArgumentError: `rho` must be a valid correlation matrix

We can fix this by computing the nearest PD correlation.

julia> adjusted_pearson_corr = cor_nearPD(pearson_corr);
julia> isposdef(adjusted_pearson_corr)true
julia> rvec(10, adjusted_pearson_corr, margins)10×200 Matrix{Float64}: - 893883.0 222669.0 388803.0 … 54417.0 35955.0 57425.0 - 1.40455e6 848455.0 582523.0 55276.0 49918.0 74646.0 - 625789.0 186944.0 225111.0 11910.0 38493.0 28807.0 - 1.34698e6 1.40865e6 592593.0 69739.0 48229.0 70136.0 - 142383.0 260246.0 108040.0 4404.0 47630.0 76318.0 - 265401.0 158078.0 525072.0 … 20310.0 45050.0 36956.0 - 634005.0 99818.0 444215.0 49720.0 26315.0 7675.0 - 444567.0 53638.0 268495.0 18170.0 14696.0 460.0 - 1.32785e6 96024.0 231676.0 35399.0 65123.0 18382.0 - 2.49658e6 1.08841e6 642524.0 101957.0 58716.0 17603.0

Benchmarking

What's more impressive is that computing the nearest correlation matrix in Julia is fast!

julia> @benchmark cor_nearPD(pearson_corr)
+ 613205.0        236761.0        355899.0  …   43565.0  41871.0  146220.0
+      1.73742e6  785202.0        789449.0     109505.0  88877.0   41784.0
+      1.12123e6       1.72087e6  255879.0      26441.0  36238.0    2346.0
+ 536381.0        170449.0        648052.0      59253.0  48139.0   91289.0
+      1.02996e6  898205.0        478684.0      41281.0  37002.0   91934.0
+      1.97149e6   24531.0        402659.0  …   55724.0  51387.0   33158.0
+      1.33043e6  654061.0        350922.0      78321.0  59880.0   20784.0
+      1.01255e6  494247.0        406456.0      18425.0  43895.0   28220.0
+      1.62858e6       1.5101e6   741128.0      44835.0  67513.0   45925.0
+ 828294.0        334329.0        271455.0      44070.0  59523.0   55779.0

Benchmarking

What's more impressive is that computing the nearest correlation matrix in Julia is fast!

julia> @benchmark cor_nearPD(pearson_corr)
 BenchmarkTools.Trial: 
   memory estimate:  7.15 MiB
   allocs estimate:  160669
@@ -41,26 +41,26 @@
   --------------
   samples:          513
   evals/sample:     1

Let's scale up to a larger correlation matrix:

julia> m3000 = cor_randPSD(3000) |> m -> cor_convert(m, Spearman, Pearson)3000×3000 Matrix{Float64}:
-  1.0          -0.00716851  -0.0169243   …   0.0233197    -0.0263826
- -0.00716851    1.0         -0.0216623      -0.00441853   -0.0109313
- -0.0169243    -0.0216623    1.0             0.00710223   -0.0204181
- -0.0164256    -0.0392892   -0.00920841      0.000708792  -0.0622288
-  0.000583932   0.0121458    0.0138495      -0.0116943     0.0437344
- -0.0176298    -0.0149731    0.0134523   …   0.00524458   -0.00289558
-  0.00832772   -0.0328138   -0.01436         0.0262169     0.0199443
-  0.00331502   -0.014238     0.00486661      0.0106294     0.00756114
-  0.0281853     0.00479124  -0.00480704     -0.00214362   -0.0110185
-  0.00553774    0.00691192  -0.0310059      -0.0251624     0.00414895
+  1.0          0.0164724     0.0407423   …   0.0103902     0.00277579
+  0.0164724    1.0           0.0191188      -0.0141403    -0.0110236
+  0.0407423    0.0191188     1.0             0.0143008    -0.0137548
+  0.0224455   -0.00664169    0.00422915      0.00622397    0.0148866
+ -0.00337379  -0.0198807     0.00572056      0.000599983  -0.0112769
+  0.0241558    0.0289758     0.0129858   …  -0.0135809     0.0419741
+  0.0254826    0.00313339   -0.00272253     -0.0115147     0.0012211
+  0.00299632  -0.0299389     0.00972675      0.0110493     0.0131804
+  0.00509257  -0.00582499   -0.00520097     -0.0199324     0.0102773
+  0.0336515   -0.0180486     0.00269183     -0.00947528   -0.00229805
   ⋮                                      ⋱
- -0.00950963    0.0201731    0.0404003       0.017257      0.0189769
- -0.0123194     0.00180822   0.00917517     -0.0226344     0.0225617
- -0.0233259     0.0146338   -0.0280363       0.00629901   -0.00898944
-  0.00605169    0.00982002  -0.00790663      0.0151823     0.00495999
-  0.00864805    0.0379612   -0.0269966   …  -0.00694579   -0.0397791
-  0.023253      0.00792592  -0.0259121       0.0128322     0.00303604
- -0.00447362    0.00256023  -0.0163627       0.0168914     0.029479
-  0.0233197    -0.00441853   0.00710223      1.0           0.0145632
- -0.0263826    -0.0109313   -0.0204181       0.0145632     1.0
julia> m3000_PD = cor_nearPD(m3000);
julia> isposdef(m3000)false
julia> isposdef(m3000_PD)true
julia> @benchmark cor_nearPD(m3000)
+ -0.031977    -0.0220573     0.00596284      0.0276635     0.0151599
+ -0.0364148    0.0201439    -0.00905063     -0.00198937    0.0269192
+ -0.0190205    0.0158941     0.0591733       0.0263583    -0.0185312
+ -0.0118402    0.0119738    -0.0122215       0.0290106     0.00835123
+ -0.013987    -0.0436087    -0.0242289   …  -0.00376803   -0.00946016
+  0.0148983   -0.000398019   0.0246131      -0.00108647   -0.01132
+ -0.0055436   -0.0336603    -0.0099331       0.00465229    0.0140295
+  0.0103902   -0.0141403     0.0143008       1.0           0.0324425
+  0.00277579  -0.0110236    -0.0137548       0.0324425     1.0
julia> m3000_PD = cor_nearPD(m3000);
julia> isposdef(m3000)false
julia> isposdef(m3000_PD)true
julia> @benchmark cor_nearPD(m3000)
 BenchmarkTools.Trial: 
   memory estimate:  3.95 GiB
   allocs estimate:  78597
@@ -82,4 +82,4 @@
   maximum time:     2.139 s (8.48% GC)
   --------------
   samples:          10
-  evals/sample:     1

And it's not too far off from the nearest:

julia> m3000_PD_fast = cor_fastPD(m3000);
julia> norm(m3000 - m3000_PD)0.7343265090972988
julia> norm(m3000 - m3000_PD_fast)0.7671348742021591
+ evals/sample: 1

And it's not too far off from the nearest:

julia> m3000_PD_fast = cor_fastPD(m3000);
julia> norm(m3000 - m3000_PD)0.7343783322172888
julia> norm(m3000 - m3000_PD_fast)0.7672542887494229
diff --git a/dev/pearson_matching/index.html b/dev/pearson_matching/index.html index 761c4a1..288413f 100644 --- a/dev/pearson_matching/index.html +++ b/dev/pearson_matching/index.html @@ -2,15 +2,15 @@ Pearson Matching · Bigsimr.jl

Pearson Matching

Correlation Conversion

Let's say we really wanted to estimate the Spearman correlation between the temperature and ozone.

julia> spearman_corr = cor(Matrix(df), Spearman)2×2 Matrix{Float64}:
  1.0       0.774043
  0.774043  1.0
julia> cor_bounds(margins[1], margins[2], Spearman)(lower = -1.0, upper = 1.0)

If we just use the Spearman correlation when we simulate data, then the errors are double.

  1. The NORTA algorithm is expecting a Pearson correlation
  2. The non-linear transformation in the NORTA step does not guarantee that the input correlation is the same as the output.

Here is what we get when we use the Spearman correlation directly with no transformation:

julia> x2 = rvec(1_000_000, spearman_corr, margins);
julia> cor(x2, Spearman)2×2 Matrix{Float64}: - 1.0 0.758658 - 0.758658 1.0

Let's try to address problem 1 and convert the Spearman correlation to a Pearson correlation.

julia> adjusted_spearman_corr = cor_convert(spearman_corr, Spearman, Pearson);
julia> x3 = rvec(1_000_000, adjusted_spearman_corr, margins);
julia> cor(x3, Spearman)2×2 Matrix{Float64}: - 1.0 0.773755 - 0.773755 1.0

Notice that the estimated Spearman correlation is essentially the same as the target Spearman correlation. This is because the transformation in the NORTA step is monotonic, which means that rank-based correlations are preserved. As a consequence, we can match the Spearman correlation exactly (up to stochastic error) with an explicit transformation.

Pearson Matching

We can employ a Pearson matching algorithm that determines the necessary input correlation in order to achieve the target Pearson correlation. Let's now address problem 2.

julia> pearson_corr = cor(Matrix(df), Pearson)2×2 Matrix{Float64}:
+ 1.0       0.758561
+ 0.758561  1.0

Let's try to address problem 1 and convert the Spearman correlation to a Pearson correlation.

julia> adjusted_spearman_corr = cor_convert(spearman_corr, Spearman, Pearson);
julia> x3 = rvec(1_000_000, adjusted_spearman_corr, margins);
julia> cor(x3, Spearman)2×2 Matrix{Float64}: + 1.0 0.774593 + 0.774593 1.0

Notice that the estimated Spearman correlation is essentially the same as the target Spearman correlation. This is because the transformation in the NORTA step is monotonic, which means that rank-based correlations are preserved. As a consequence, we can match the Spearman correlation exactly (up to stochastic error) with an explicit transformation.

Pearson Matching

We can employ a Pearson matching algorithm that determines the necessary input correlation in order to achieve the target Pearson correlation. Let's now address problem 2.

julia> pearson_corr = cor(Matrix(df), Pearson)2×2 Matrix{Float64}:
  1.0      0.69836
  0.69836  1.0

If we use the measured correlation directly, then the estimated correlation from the simulated data is far off:

julia> x4 = rvec(1_000_000, pearson_corr, margins);
julia> cor(x4, Pearson)2×2 Matrix{Float64}: - 1.0 0.571791 - 0.571791 1.0

The estimated correlation is much too low. Let's do some Pearson matching and observe the results.

julia> adjusted_pearson_corr = pearson_match(pearson_corr, margins)2×2 Matrix{Float64}:
+ 1.0       0.574778
+ 0.574778  1.0

The estimated correlation is much too low. Let's do some Pearson matching and observe the results.

julia> adjusted_pearson_corr = pearson_match(pearson_corr, margins)2×2 Matrix{Float64}:
  1.0       0.850495
  0.850495  1.0
julia> x5 = rvec(1_000_000, adjusted_pearson_corr, margins);
julia> cor(x5)2×2 Matrix{Float64}: - 1.0 0.698914 - 0.698914 1.0

Now the simulated data results in a correlation structure that exactly matches the target Pearson correlation!

+ 1.0 0.698734 + 0.698734 1.0

Now the simulated data results in a correlation structure that exactly matches the target Pearson correlation!