Skip to content
View PaulinaIA's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report PaulinaIA

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
PaulinaIA/README.md

Paulina Peralta (@PaulinaIA)

AI & Data Science • Electronic Engineer • Barcelona
Building end-to-end ML systems and exploring Generative AI for scientific simulation (Diffusion / DiT)

LinkedIn · Email · GitHub

Python SQL Spark Polars Docker AWS


About

I’m an Electronic Engineer focused on Applied ML + Data Engineering, and I’m especially interested in Generative AI for scientific simulation (diffusion/transformer-based models).
I care a lot about clean repo structure, reproducibility, and building projects that feel real (not just notebooks).


What I’m focused on now

  • Generative models for simulation (Diffusion / DiT) and high-dimensional generation
  • ML systems: scraping → storage → processing (Polars/Spark) → modeling → evaluation
  • Production mindset: modular codebases, documentation-first, scalable pipelines

My pipeline mindset

flowchart LR
  A[Data Sources] --> B[Ingestion / Scraping]
  B --> C[Normalize + Validate]
  C --> D[(Storage)]
  D --> E[Processing<br/>Polars / Spark]
  E --> F[Modeling<br/>sklearn / H2O / DL]
  F --> G[Evaluation + Reporting]
  G --> H[Delivery<br/>Dashboards / API / Notebooks]
Loading

Featured projects (my current favorites)

Bloom — Adaptive fertility prediction (ML + personalization)

A cycle prediction system that combines ML + personalized signals to estimate cycle length and ovulation timing.

  • Clean modular design (feature engineering + predictors + evaluation)
  • Emphasis on interpretability and practical use
flowchart LR
  X[(User Logs)] --> FE[Feature Engineering]
  FE --> M1[Cycle Predictor]
  FE --> M2[Ovulation Classifier]
  M1 --> OUT[Predictions + Insights]
  M2 --> OUT
Loading

Repo: https://github.com/PaulinaIA/Bloom

Dark Eye Core — Threat Intelligence ETL + dashboards

Pipeline that extracts and normalizes IoCs from multiple sources and visualizes them in Grafana.

  • Multi-source ingestion (AbuseIPDB / URLhaus / OTX)
  • Normalization + relational storage + observability-ready design
flowchart LR
  S1[AbuseIPDB] --> N[Normalizers]
  S2[URLhaus] --> N
  S3[AlienVault OTX] --> N
  N --> DB[(PostgreSQL)]
  DB --> G[Grafana Dashboards]
Loading

Repo: https://github.com/PaulinaIA/dark_eye_core

Moltbook Safety — Web scraping + behavioral signals → karma prediction

End-to-end data engineering + applied ML project.

  • Web scraping → relational model → processing with Polars/Spark → modeling (H2O AutoML)
  • Focus on content/behavior features and responsible analysis
flowchart LR
  W[Web Scraping] --> R[(Relational Tables)]
  R --> P[Processing<br/>Polars / Spark]
  P --> FS[Feature Set]
  FS --> ML[Modeling<br/>H2O AutoML]
  ML --> EV[Metrics + Insights]
Loading

Repo: https://github.com/PaulinaIA/moltbook-safety


Tech stack (curated)

Python · SQL · Spark · Polars · Airflow · Docker · AWS · scikit-learn · PyTorch · TensorFlow
Also: R · MATLAB · C/C++ · Embedded/IoT background


How I work

  • I care about clarity (simple architecture, readable code)
  • I prioritize reproducibility (structured repos, deterministic pipelines when possible)
  • I like projects with meaning + impact, not only metrics

Contact

📩 pauliperalta@gmail.com
🔗 https://www.linkedin.com/in/paulina-peralta-916a46140/

Pinned Loading

  1. moltbook-safety moltbook-safety Public

    Data engineering pipeline for Moltbook: scraping → relational storage → feature engineering → ML models for karma prediction.

    Jupyter Notebook 2 1

  2. dark_eye_core dark_eye_core Public

    Threat Intelligence ETL: multi-source IoC ingestion, normalization, storage, and monitoring.

    Python 1

  3. Bloom Bloom Public

    ML-powered fertility prediction system that combines population models with individual personalization to predict menstrual cycles, ovulation windows, and health anomalies

    Jupyter Notebook 2 1

  4. peliculas-mongodb peliculas-mongodb Public

    MongoDB-based movie catalog manager with modular architecture, analytics aggregations, and CLI/Web UI.

    Python

  5. tensorflow-deep-learning tensorflow-deep-learning Public

    Forked from mrdbourke/tensorflow-deep-learning

    All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

    Jupyter Notebook

  6. machinelearning-az machinelearning-az Public

    Forked from joanby/machinelearning-az

    Repositorio del Curso de Machine Learning de la A a la Z con R y Python

    Jupyter Notebook