Skip to content

Conversation

@MaxGiting
Copy link
Collaborator

I haven't done any comparisons yet. But removing regexes and shortening others can only help right!

So far I have:

  • Added checker to the generic regex which removes the need for 13 other regexes.
  • Added reader to the generic regex which removes the need for 10 other regexes.

I've made sure that every regex removed had a related user agent in the tests.

I am going to shorten a lot of the really long regexes as they just don't need to be so long.

What are peoples thoughts on adding the word extractor to the generic regex? This would eliminate the need for another 6 regexes which I feel are so specific to bots.

@JayBizzle
Copy link
Owner

LGTM 👍

@MaxGiting
Copy link
Collaborator Author

  • Added extractor to the generic regex which removes the need for 6 other regexes.
  • Removed [0-9] regex range from 34 regex patterns. Seemed overkill.

@MaxGiting
Copy link
Collaborator Author

MaxGiting commented Jan 6, 2019

If there is whitespace in a regex, is there a rule that it must be "escaped" with a backslash? I vaguely remember this is due to people using the raw export not because of PHP.

For example this line and the next differ.

'Keyword\ Density',

'Keywords Research',

@MaxGiting
Copy link
Collaborator Author

MaxGiting commented Jan 7, 2019

44 regexs removed. A lot of tidying up as well. All listed below.

  • Added to generic regex:

    • checker
    • reader
    • extractor
    • monitoring
    • analyzer
  • Remove [0-9] range where not needed.

  • Remove \.com where not needed.

  • Remove \/ where not needed.

  • Reduce the length of some very long regexs

  • Remove \ as there's no need to escape whitespace. Make it consistent with for all regexs.

@MaxGiting MaxGiting changed the title Performance WIP Performance and general clean up Jan 7, 2019
@MaxGiting
Copy link
Collaborator Author

Seen roughly a 4 - 6% increase in speed. It will easily be eaten up as we add more user agents, but a good clean out none the less.

@MaxGiting MaxGiting merged commit 9b56510 into master Jan 7, 2019
@MaxGiting MaxGiting deleted the dev branch January 7, 2019 20:19
gplumb added a commit to gplumb/NetCrawlerDetect that referenced this pull request Jan 12, 2019
gplumb added a commit to gplumb/NetCrawlerDetect that referenced this pull request Jan 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants