Stars
A curated list of resources on document similarity measures (papers, tutorials, code, ...)
Code and examples from out talk in the Open Source Summit europe
Supercharge Your LLM Application Evaluations 🚀
Exchange rates API is a simple and lightweight free service for current and historical foreign exchange rates & crypto exchange rates.
A Pretrained BERT Model for Financial Communications. https://arxiv.org/abs/2006.08097
Mastering Spark for Data Science, published by Packt
A collection of useful functions to be deployed as custom skills for Azure Cognitive Search
Tesseract Open Source OCR Engine (main repository)
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social …
scikit-learn: machine learning in Python
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
curated collection of papers for the nlp practitioner 📖👩🔬
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Building three recognition models to classify fruits with fruit doodle images obtain from QuickDraw Dataset using Spark.
Toturials coming with the "data science roadmap" picture.