Conversation
bdewilde
left a comment
There was a problem hiding this comment.
Hi @datanizing , thanks for the PR. I have to get the develop branch set up for spaCy v3 — it's been on my backburner for a while, but going to make some time soon — then we'll want to rebase these changes onto that. Will report back when that branch is ready for you, okay?
| self.n_sents = 0 | ||
| self.n_tokens = 0 | ||
| if data: | ||
| if data is not None: |
There was a problem hiding this comment.
Is this a v3-specific issue, or just an edge case bug that has never been properly handled? Could you give a mini-example showing how this failed for you?
There was a problem hiding this comment.
Update: I'm guessing from the commit message that you added this for numpy's sake, e.g.
In [1]: import numpy as np
In [2]: arr = np.ndarray([1,2,3])
In [3]: if arr: print("okay")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-a19c660a75e5> in <module>
----> 1 if arr: print("okay")
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Which makes this an edge case that I never properly handled. I'm going to just add it to the latest develop branch, which is now working (with minimal hacks...) for spacy v3, since the other two changes in this PR were incidentally handled on the way to making the code run and tests pass.
| freqs = itertoolz.frequencies(ngram.lemma_.lower() for ngram in ngrams_) | ||
| ngrams_ = (ngram for ngram in ngrams_ if freqs[ngram.lemma_.lower()] >= min_freq) |
There was a problem hiding this comment.
Again, is this a spaCy v3 issue, or just a difference in expectations? Lemmatizing and lowercasing under the hood when determining ngram frequencies felt like more magic than I would expect, so I stopped at just lowercasing. I'm open to reconsidering, but would want to check that lemmas are available for all languages — I don't think so, in which case this feature would fail for certain langs, but I'm not sure.
There was a problem hiding this comment.
Okay, I just saw that they removed Span.lower_ from the v3 API, so the direct equivalent here would be to use .text.lower(). I'm currently working through a pile of bugs/errors on the v3-based develop branch, so I'll fix it there — just have to get everything working again before I start making larger v3-oriented changes. :)
| ) | ||
| matcher = Matcher(doc.vocab) | ||
| matcher.add("match", on_match, *patterns) | ||
| matcher.add("match", patterns, on_match=on_match) |
There was a problem hiding this comment.
Noted! I remember seeing this in spacy's changelog.
|
Hi @datanizing , your changes (or in one case, a basically equivalent change) are now live in the |
Hi Burton,
as discussed with Jens I have created a few changes to make (parts of)
textacywork withspacy3.0.Description
Take care of API changes in spacy 3.0
Make passing of
numpyarrays possibleMotivation and Context
Make it work with latest
spacyHow Has This Been Tested?
Ran the normal examples
Screenshots (if appropriate):
Types of changes
Checklist: