Skip to content
View raylyh's full-sized avatar
😊
😊
  • Southampton

Block or report raylyh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A curated list of resources on document similarity measures (papers, tutorials, code, ...)

256 24 Updated Jul 13, 2022
Jupyter Notebook 2 1 Updated Aug 9, 2023

Code and examples from out talk in the Open Source Summit europe

Jupyter Notebook 3 2 Updated Sep 28, 2023

Supercharge Your LLM Application Evaluations 🚀

Python 12,699 1,259 Updated Feb 24, 2026

Exchange rates API is a simple and lightweight free service for current and historical foreign exchange rates & crypto exchange rates.

CSS 359 33 Updated Oct 2, 2023

A Pretrained BERT Model for Financial Communications. https://arxiv.org/abs/2006.08097

Jupyter Notebook 643 141 Updated Jul 23, 2023

Mastering Spark for Data Science, published by Packt

Scala 49 46 Updated Jan 18, 2023

A collection of useful functions to be deployed as custom skills for Azure Cognitive Search

C# 319 172 Updated Feb 20, 2026

Tesseract Open Source OCR Engine (main repository)

C++ 72,533 10,515 Updated Feb 21, 2026

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python 9,113 692 Updated Feb 24, 2026

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

Python 9,772 864 Updated Jan 28, 2026

ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Python 5,318 532 Updated Sep 10, 2025

Alias-Free GAN project website and code

1,293 42 Updated Oct 11, 2021

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social …

Python 4,940 1,060 Updated Mar 16, 2024

scikit-learn: machine learning in Python

Python 65,214 26,733 Updated Feb 24, 2026

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Python 1,562 248 Updated Jun 12, 2025

curated collection of papers for the nlp practitioner 📖👩‍🔬

1,071 89 Updated Aug 5, 2020

100 Must-Read NLP Papers

3,846 565 Updated Jul 9, 2021

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Python 22,982 3,615 Updated Jul 28, 2024

Building three recognition models to classify fruits with fruit doodle images obtain from QuickDraw Dataset using Spark.

Jupyter Notebook 1 Updated Dec 7, 2018

Toturials coming with the "data science roadmap" picture.

Jupyter Notebook 7,343 1,925 Updated Feb 11, 2025