-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hi,
I'm not exactly sure the right place for this, but I'm thinking of implementing GaMD and would like to have a good idea if (and how) I'll be able to re-weight the data I get out of the MD simulation. I also want to make sure that I'm understanding the existing issues properly and if there are existing ways of improving the estimation. I really appreciate the availability (and readability) of these analysis scripts and example data since it helps me see how the math actually works in practice.
Based on this 2014 paper, it is concluded that it is not worth using higher order cumulants because of numerical stability. Figure 8e and 8f (Trp-cage) show that the error in the 2nd and 3rd cumulant estimations corresponds to locations where the anharmonicity is large (also near a local maxima). Then in figures S3e and S3f (alanine dipeptide), it shows that the errors also occur near maxima even in an excess of sampling.
Here are the questions I have about this:
- Does this mean that the distribution of dV values is fundamentally skewed (has higher order cumulants) near maxima, or is it just skewed because sampling those maxima is difficult?
- Is this usually not an issue?
- Would higher order moments even help if the underlying problem is truly a sampling issue?
I haven't seen anything about it in any of the more recent papers/reviews, but maybe I missed it! The main reason I'm asking is because it seems like the issues are not arising due to calculation of <dV^n>, but rather in the formula used to combine the moments into cumulants. If it were issues with <dV^n>, the Maclaurin series would have stability issues as well. I recently stumbled onto this wikipedia article, where it mentions that variance and higher order central moments have stability issues when calculated using moments.
Since the third cumulant is just the third central moment (according to this wiki article), I figured that I could try out replacing the formula in your code in terms of moments for the formula in terms of the third central moment (calculated with spicy.stats.skew()). This is the result from the 1D example given:
Maybe you have already tried this and decided the 2nd order results are good and locations where there exists anharmonicity should simply be ignored since they aren't sampled well enough. Or it is possible that this data is particularly well suited for the changes I made. As a check, I wanted to try to use higher order cumulants, but 4th order fails:
I believe because I'm required to use a combination of lower order central moments in a manner similar to how 3rd order cumulants were used prior. There could exist a stable algorithm for all cumulants, but I haven't looked too much. I'm just curious to know if this could help (in other cases, since obviously these graphs show that C2 and C3 agree pretty well?
Thanks for all the great work with GaMD its a really great idea and I'm excited to try it out!

