AI & Data Science • Electronic Engineer • Barcelona
Building end-to-end ML systems and exploring Generative AI for scientific simulation (Diffusion / DiT)
I’m an Electronic Engineer focused on Applied ML + Data Engineering, and I’m especially interested in Generative AI for scientific simulation (diffusion/transformer-based models).
I care a lot about clean repo structure, reproducibility, and building projects that feel real (not just notebooks).
- Generative models for simulation (Diffusion / DiT) and high-dimensional generation
- ML systems: scraping → storage → processing (Polars/Spark) → modeling → evaluation
- Production mindset: modular codebases, documentation-first, scalable pipelines
flowchart LR
A[Data Sources] --> B[Ingestion / Scraping]
B --> C[Normalize + Validate]
C --> D[(Storage)]
D --> E[Processing<br/>Polars / Spark]
E --> F[Modeling<br/>sklearn / H2O / DL]
F --> G[Evaluation + Reporting]
G --> H[Delivery<br/>Dashboards / API / Notebooks]
A cycle prediction system that combines ML + personalized signals to estimate cycle length and ovulation timing.
- Clean modular design (feature engineering + predictors + evaluation)
- Emphasis on interpretability and practical use
flowchart LR
X[(User Logs)] --> FE[Feature Engineering]
FE --> M1[Cycle Predictor]
FE --> M2[Ovulation Classifier]
M1 --> OUT[Predictions + Insights]
M2 --> OUT
Repo: https://github.com/PaulinaIA/Bloom
Pipeline that extracts and normalizes IoCs from multiple sources and visualizes them in Grafana.
- Multi-source ingestion (AbuseIPDB / URLhaus / OTX)
- Normalization + relational storage + observability-ready design
flowchart LR
S1[AbuseIPDB] --> N[Normalizers]
S2[URLhaus] --> N
S3[AlienVault OTX] --> N
N --> DB[(PostgreSQL)]
DB --> G[Grafana Dashboards]
Repo: https://github.com/PaulinaIA/dark_eye_core
End-to-end data engineering + applied ML project.
- Web scraping → relational model → processing with Polars/Spark → modeling (H2O AutoML)
- Focus on content/behavior features and responsible analysis
flowchart LR
W[Web Scraping] --> R[(Relational Tables)]
R --> P[Processing<br/>Polars / Spark]
P --> FS[Feature Set]
FS --> ML[Modeling<br/>H2O AutoML]
ML --> EV[Metrics + Insights]
Repo: https://github.com/PaulinaIA/moltbook-safety
Python · SQL · Spark · Polars · Airflow · Docker · AWS · scikit-learn · PyTorch · TensorFlow
Also: R · MATLAB · C/C++ · Embedded/IoT background
- I care about clarity (simple architecture, readable code)
- I prioritize reproducibility (structured repos, deterministic pipelines when possible)
- I like projects with meaning + impact, not only metrics
📩 pauliperalta@gmail.com
🔗 https://www.linkedin.com/in/paulina-peralta-916a46140/