Skip to content

Conversation

@BradenDKelly
Copy link
Collaborator

@BradenDKelly BradenDKelly commented Jul 14, 2020

This supercedes PR #2

Modified code so that prefactors were calculating prior to loops. This saved about 15-20% on the time for the big.molden file.

Vectorized angular momentums using np.frompyfunc which uses numpy ufunc. This got total speed increase up to 35%

All changes were made only in overlap.py

Relevant changes:

Additions:

  • converted function compute_overlap_gaussian_1d into a ufunc so that z,y,z and angular momentums could be broadcasted rather than calculated sequentially.

Added

function def iter_cart_alphabet2(n):
    """Loop over powers of Cartesian basis functions in alphabetical order."""

    ang_mom = []
    for nx in range(n, -1, -1):
        for ny in range(n - nx, -1, -1):
            nz = n - nx - ny
            ang_mom.append(np.array((nx, ny, nz)))
    return np.asarray(ang_mom)

This is because vectorization needed an explicit array of angular momentums rather than using iterators in for loops.
Removals:

  • removed compute_overlap_gaussian_3D. Broadcasting is used to hand the 3D case.

Modified:

  • modified compute_overlap_gaussian_1d. removed all prefactor calcualtions before the double for loop. Since these all depend on coordinates which do not change for angular momentum calculations, all variables depending on primitive exponents and shell centers are calculated prior to looping over angular momentums.

Small changes made: whereever possible I reduced FLOPS by defining a variable, and then using it several times.

e.g., if x * y showed up numerous times, particularly inside a loop, I defined x_y = x * y in order to save on multiplications

big.molden takes ~25.5 seconds on my computer. Original takes ~38 seconds.

@BradenDKelly BradenDKelly requested a review from FarnazH July 14, 2020 09:28
@FarnazH
Copy link
Owner

FarnazH commented Jul 15, 2020

Thanks @BradenDKelly for your PR and contribution. This may not be the best way to measure the performance of the code, but I used the snippet below on master and your_branch to load big.molden file 100 times and average the loading time; the result was 54.92s on master and 57.35s on your_branch one my computer (which is running very slow these days).

import timeit

code_to_test = """
from iodata import load_one
mol = load_one("iodata/test/data/big.molden")
"""

elapsed_time = timeit.timeit(code_to_test, number=100) / 100
print('elapsed time = ', elapsed_time)

This is very promising, can you please do a thorough performance testing before making a PR to theochem/iodata?

@BradenDKelly BradenDKelly force-pushed the vectorized_angular_momentums branch from 9529bf3 to b6cd6a0 Compare July 26, 2020 18:25
@BradenDKelly BradenDKelly force-pushed the vectorized_angular_momentums branch 2 times, most recently from a4ef8ef to b3d7f65 Compare August 20, 2020 18:35
@BradenDKelly BradenDKelly force-pushed the vectorized_angular_momentums branch 2 times, most recently from 8054376 to 4966800 Compare August 20, 2020 20:19
@FarnazH FarnazH force-pushed the vectorized_angular_momentums branch from 3889285 to 566cdb9 Compare August 21, 2020 14:42
@FarnazH FarnazH force-pushed the vectorized_angular_momentums branch from 5b6a5fb to 3ddc816 Compare August 22, 2020 02:46
tovrstra and others added 8 commits August 23, 2020 12:02
These fixes are unrelated to the current PR. They appeared because of
an update of pycodestyle.
The name of this file has changed to formats/fcidump.py in
origin/master, so it causes a merge conflict. The change in formats/molpro.py
has nothing to do with this PR (making a pure-python implementation of
overlap), so the change was reverted to make merging easier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants