Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong usage of the Nataf transformation #70

Open
regislebrun opened this issue Dec 19, 2020 · 4 comments
Open

Wrong usage of the Nataf transformation #70

regislebrun opened this issue Dec 19, 2020 · 4 comments
Assignees

Comments

@regislebrun
Copy link

I can see two issues in the way the Nataf transformation is defined In Transformation.py:

  • The Nataf transformation is presented as a way to induce correlation between random variables, which is not the way it works. Basically, this transformation is defined by a set of univariate CDFs and a symmetric positive definite matrix with unit diagonal (a kind of correlation matrix). When this transformation is applied to a random vector with marginal distributions having these CDFs and a Gaussian copula having this matrix as a shape matrix, then the transformed random vector is distributed according to a standard multivariate Gaussian distribution with independent marginals (see my PhD thesis here, chapter 3). If you applied it to a random vector having different marginal distributions or a non-Gaussian copula then you don't get a random vector with such a standard Gaussian distribution.
  • The matrix defining the Nataf transformation may be obtained from the linear correlation of a sample using the ITAM method. It has two drawbacks:
    • from a methodological point of view, the matrix should be linked to the copula only as it is only related to the dependence, which is not the case with the linear correlation: its existence depends on the marginals (e.g. it is not defined with Cauchy marginal distributions)
    • from a practical point of view, not all the linear correlation matrices can be obtained using given marginals and an arbitrary Gaussian copula. The example involving log-normal distributions (see Theorem 1.30 in my thesis) is a well-known example of this phenomenon.

You could improve a lot your module by making the link between the isoprobabilistic transformations and the probabilistic model more oriented: the transformation is conditioned by the model, not the other way around. You can have a look at the OpenTURNS project (www.openturns.org) and more specifically the Uncertainty/Algorithm/Transformation folder of the sources to have a possible implementation of it. The only fully generic transformation you can rely upon is the Rosenblatt transformation, also available in OpenTURNS.

@mds2120 mds2120 self-assigned this Jan 29, 2021
@mds2120
Copy link
Contributor

mds2120 commented Jan 29, 2021

Hi Regis,
We will look into your concerns in detail and respond soon.
Michael

@mds2120
Copy link
Contributor

mds2120 commented Feb 7, 2021

Hi Regis,
I believe you may be somehow misunderstanding our implementation of the Nataf transformation as I am not able to see the inconsistency between your statements and the manner in which our code is implemented.

We do not present Nataf "as a means to induce correlation between random variables" as you suggest. As you can see in the documentation (https://uqpyproject.readthedocs.io/en/latest/transformations_doc.html), the Nataf transformation is defined as the CDF-based mapping from a vector of arbitrarily distributed random variables X (having known marginals F_i(X_i) and Gaussian copula) to a vector of standard normal random variables Z. If the variables in X are linearly correlated, the Nataf transformation will not produce independent Gaussian random variables, as I believe you suggest. Instead, the Nataf transformation will produce linearly correlated normal random variables. One cannot analytically compute this linear correlation of the Gaussian random vector although it can be determined using the ITAM. See the attached paper on the ITAM, which was written for stochastic translation processes (the stochastic process extension of Nataf) but is equally applicable to random vectors.

Of course, it may arise that the user prescribes a random vector with marginals and correlation matrix that are incompatible with the Nataf model. That is, there are certain well-defined combinations of marginal distributions and correlations for which it is impossible to find a correlated Gaussian random vector that maps to this particular pair of marginals and correlations. This is the well-known Nataf incompatibility to which you refer in your question. The ITAM method is built exactly for this case. It is meant to identify a normal random vector Z and the associated linear correlation matrix that will map as close as possible to the Nataf incompatible vector with incompatible marginals and (non-Gaussian) linear correlation. This leverages the so-called correlation distortion that can be computed directly from the marginal distributions and Gaussian correlation (see the equation for xi_ij in the documentation link provided above).

I hope this clarifies our implementation from a theory perspective. That said, I still have some questions regarding your suggestion about its implementation. The following statement is not clear to me: "You could improve a lot your module by making the link between the isoprobabilistic transformations and the probabilistic model more oriented: the transformation is conditioned by the model, not the other way around." On preliminary investigation, I see some differences in the structure of our code and OpenTurns. Specifically, OpenTurns has broken out various separate classes for Nataf and InverseNataf and it has additional capability related to the generalized Nataf for elliptical copulas. UQpy, on the other hand, includes both Nataf and its inverse in a single class that also deals with the case of linearly correlated random variables. OpenTurns does not seem to deal with this case of linearly correlated random variables and correlation distortion. Please correct me if I'm wrong on that point.

Thanks for your comments and please clarify if I have misunderstood you.
Michael

Kim_Shields_CaS_2015.pdf

@regislebrun
Copy link
Author

Hi Michael,
I just read the documentation of the Nataf transformation. I am a little bit surprised by the following points:

  • what you call the Nataf transformation is only half of the transformation sharing the same name in other packages (UQlab, OpenTURNS, Proban where it is called the Nataf distribution model): you consider only the marginal transformation part if I understand well the documentation. In other packages this step is followed by a decorrelation step in order to get a vector Z with independent components, in relation with the FORM/SORM methods in which Z is expected to be spherically distributed. But you may want to use it in another context than the historical one.
  • There is no mention of the Gaussian copula in the whole documentation of this transformation, which surprises me as Z will have a Gaussian distribution iff X has a Gaussian copula. In my view it should be stated, the copula concept is well spread nowadays.
  • In my first post I underlined the fact that a given correlation matrix may be incompatible with a given list of marginal distributions, regardless of the copula. There is simply no multivariate distribution with this correlation matrix and these marginal distributions (e.g. F_1=LogNormal(0,1), F_2=LogNormal(0,5), rho_12=0.5). It should be stated clearly, it is neither a limitation of the ITAM method or the Nataf transformation. In my view this point is the most problematic one regarding the way multivariate models are specified. In the same spirit, but less problematic, multivariate distributions may exist with a given set of marginal distributions and correlation matrix, but none with a Gaussian copula. In this case the ITAM method does its best to get the closest model with a Gaussian copula.
  • The fact that the given correlation cannot be recovered (hence the distortion) is far from being the main concern for reliability studies. In my view, the main problem is that the Gaussian copula induces a zero tail dependence, which can lead to an evaluation of the probability of a rare event wrong by several orders of magnitude
  • In OpenTURNS we are focused on multivariate distributions. It is the first thing you build, either by using a multivariate model (Normal, Student, Dirichlet...), by using combinations of such models (Mixture, KernelMixture...) or by combining marginal distributions and a copula (ComposedDistribution). Then, once this distribution is built, you can ask for its isoprobabilistic transformation, i.e. a transformation such that, when applied to a random vector having this distribution, leads to a random vector with a spherical distribution, most of the case the multivariate standard Gaussian distribution. It is what I had in mind when I wrote that the transformation should be a consequence of the multivariate transformation. But there is no obligation: it is just a personal opinion ;-).
  • The fact that we separated things into many classes in OpenTURNS is because we didn't expected the user to build its own transformation from scratch and apply it on arbitrary random vectors, in order to prevent him to build inconsistent (or at least misleading) objects.

To help you in your investigations (as you started to look at our code) here is the list of transformations available in OpenTURNS, all located into the Uncertainty/Algorithm/Transformation root directory:

  • The marginal transformation, more or less equivalent to your Nataf transformation. It allows to map marginal distributions G_i into marginal distributions H_i, so you get your Nataf transformation if G_i=F_i and Hi=Phi, or its inverse if G_i=Phi and H_i=F_i
  • The Nataf transformation and its inverse for distributions with elliptical copulas, with two special cases when the distribution has an independent copula or when the distribution is elliptical (i.e. its marginal distributions are adapted to its copula)
  • The Rosenblatt transformation and its inverse. This transformation maps an arbitrary multivariate continuous distribution into a standard multivariate Gaussian distribution using conditional cumulative distribution functions.
    Don't get confused by the separation between Evaluation, Gradient and Hessian, the main point is the evaluation.

Sorry for my bad english, I do my best but past midnight it becomes even worst!

Régis

@mds2120
Copy link
Contributor

mds2120 commented Feb 9, 2021

Regis, thanks for your detailed response.

To your individual points:

  • You are correct that we do not integrate the decorrelation step into the Nataf code. The reason for this is simple. From our perspective, these are two distinct transformations. If you want an uncorrelated Gaussian random vector, you must simply apply the Decorrelate operation in UQpy following the Nataf. Note that we do not call it the Nataf distribution because, imo, it is not a distribution. It is a transformation that induces a distribution of a certain form.
  • I suppose you are correct that the Gaussian copula is not mentioned in the Nataf documentation. This is simply an oversight in the interest of brevity in the docs and not wanting to delve too deeply into theory. The objective is to be as concise and practical as possible. You will notice that we reference your paper, which of course provides these details quite nicely. We can easily add a few lines to improve clarity and make the connection to copula theory.
  • This is a good point and is, frankly, one that I have not given much thought to. I have not worked a great deal with the Nataf in the context of models that do not have a Gaussian copula. What you say makes sense regarding the general Nataf incompatibility. It should be possible, for example, to generalize the ITAM to consider models where the copula is non-Gaussian. Perhaps a reason to collaborate? We can also clarify this in the documentation. If you don't mind, I will ask you to review when it's complete.
  • I agree that recovering the exact correlation is not the primary concern in reliability, although it may be more important for non-Gaussian copulas where tail dependence is indeed important.
  • If you look closely at our documentation, you will see that we follow a similar approach with respect to multivariate distributions. Multivariate distributions are built using the Distributions class as either a joint distribution with independent marginals (JointInd) or a joint distribution with a specified copula (JointCopula). Where we differ is that our Nataf class currently only accepts jointly independent distribution objects (hence implying a Gaussian copula), but the structure of the code would not preclude us from allowing the Nataf class to accept a joint distribution with a copula. In this sense, I believe we are consistent with your line of thinking that the "transformation should be a consequence of the multivariate transformation," but perhaps I still misunderstand. Moreover, it makes it somewhat natural for us to consider coding a generalized Nataf model for non-Gaussian copulas.
  • We take a different mindset with regard to user developments. If the user (for example, you) want to built new transformations in UQpy, we are strongly encouraging of this. Of course, we would not allow this to be integrated into the released code without some oversight and insurance that the models are correct and properly implemented. But, our spirit is the UQpy should serve as a development environment. We don't just want people to use UQpy. We want people to help us write UQpy!

And thanks for the pointers wrt OpenTURNS. No question we will study your code carefully when considering extensions to these parts of our code.

In the end, it seems we are of similar mind on many of these matters and I hope that my responses have helped to clarify.

Best,
Michael.

P.S. No need to apologize. I had no trouble with your English.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants