Raúl Speroni
Principal Data Engineer and Technical Lead
Principal Data Engineer and technical lead working across data platforms, search quality, applied AI/ML, legal tech, and large-scale document processing.
- search quality and recommendation systems
- measurement, feedback loops, and production KPIs
- data platforms and document processing
- AI/LLM-enabled workflows with human review
- technical leadership for engineering teams
- open data and digital participation tools
- Spanish: native
- English: fluent
Profile
- Principal Data Engineer and technical lead with experience building search-quality systems, recommendation workflows, data platforms, document-processing systems, and applied AI/ML workflows.
- Currently lead ML/Data teams in legal tech, translating product ideas into phased technical plans, production systems, and measurable feedback loops.
- Strong background in Python, AWS, PostgreSQL, OpenSearch/Elasticsearch, distributed queues, ranking/retrieval workflows, document enrichment, and human-in-the-loop ML/LLM systems.
- Experienced in public-sector and independent projects involving open data, digital participation, and large document collections.
Professional Experience
Expert Institute
Principal Data Engineer + Tech Lead · New York, NY / Remote · 2022 - Present
Selected impact and responsibilities
- Lead ML/Data teams building production systems for legal documents, spanning recommendation workflows, search quality, ingestion, and enrichment.
- Built a recommendation workflow combining LLM case analysis, ontology mapping, semantic/vector evidence, OpenSearch ranking, eligibility filters, and calibrated confidence levels.
- Created the measurement and feedback loop around quality: production data models, recurring KPI reporting, recall-miss audits, AI-assisted feedback classification, remediation tracking, post-release validation, and product decisions about where automation was reliable enough.
- Operate a document-processing system that handles approximately 300K documents per month and 15M pages, using 24 AWS Lambda microservices, SQS, and a custom DynamoDB-based router.
- Own data storage and retrieval across PostgreSQL, S3, OCR outputs, JSON outputs, OpenSearch, full-text search, and vector-search workflows.
- Maintain PostgreSQL workloads around 485 GB, with the largest table above 450M rows.
- Keep typical document turnaround around 10-12 minutes, with p95 below 50 minutes.
- Integrated deep-learning and LLM components using OpenAI and Gemini APIs for document summarization, parsing, and enrichment.
- Replaced manual data-entry workflows with automated enrichment pipelines.
- Oversee CI/CD, observability, and reliability using CloudWatch, Langfuse, Looker Studio, Streamlit, and custom dashboards.
Freelance and Volunteer Work
Senior Software Engineer / Project Lead · Montevideo, Uruguay · 2020 - Present
Selected impact and responsibilities
- Led product and software development for a Django project for a foundation, coordinating a team of around six people.
- Built backend services, Django REST APIs, Celery tasks, and deployment infrastructure.
Gavelytics
Senior Data Engineer · Los Angeles, CA / Remote · 2021 - 2022
Selected impact and responsibilities
- Built data pipelines, ETL processes, data models, and data architecture for legal analytics products.
- Worked on data science, NLP, and MLOps workflows.
Municipality of Montevideo
Technical Project Management, Data Pipelines, Research, and Development · Montevideo, Uruguay · 2017 - 2021
Selected impact and responsibilities
- Designed and developed a data pipeline for retrieving, enriching, and visualizing social media mentions related to the city.
- Deployed a scalable microservices architecture that can connect new data sources and text/image processing modules.
- Helped the organization measure citizen interaction on social networks, detect events, and trigger alerts.
- Technically led a multichannel chatbot project, coordinating vendors and the internal development team.
- Researched and developed NLP and ML features for citizen-service channels, including WhatsApp.
e-Participation Project, Udelar - Agesic
Research, Architecture, and Tooling · Montevideo, Uruguay · 2019 - 2021
Selected impact and responsibilities
- Worked on a multidisciplinary project between the University of the Republic and the Uruguayan state to develop guidance for digital citizen participation.
- Directed the selection, adaptation, and deployment of open-source participation tools.
- Proposed an architecture for integrating those tools into a shared platform.
NLP Group, Faculty of Engineering, University of the Republic
Research Assistant · Montevideo, Uruguay · 2017 - 2019
Selected impact and responsibilities
- Worked on anonymization and automatic classification of judicial decisions in the Natural Language Processing group.
- Experimented with language-processing techniques for legal/judicial documents.
Magnesium Coop
Founding Partner, Developer, DevOps · Montevideo, Uruguay · 2015 - 2021
Selected impact and responsibilities
- Founding partner and developer across software and web-service projects for companies and organizations.
- Configured and maintained CI/CD pipelines for deploying and scaling services.
Uruguayan Postal Service
Full-stack Developer · Montevideo, Uruguay · 2014 - 2017
Selected impact and responsibilities
- Researched and developed the organization’s first high-availability services for receiving postal events.
Total Server Solutions
Linux Systems Administrator · Montevideo, Uruguay · 2007 - 2008
Selected impact and responsibilities
- Worked on Linux systems administration, attack detection and prevention, and systems/services monitoring.
Selected Independent Projects
draftingdocs.org
Collaborative drafting for sensitive documents. AI suggestions are treated as reviewable proposals that require explicit human approval, with traceability over changes and approvals.
memoria.uy
Open-source experiment on news perception in Uruguay. Users anonymously vote on whether news items are good or bad, and the system uses clustering to reveal aggregate patterns.
Cruzar
Engineering collaboration on a large historical archive project. The work focused on tools to extract, organize, and search information across more than 3M documents.
Marcha Virtual
Adapted an open-source system so people could participate virtually in Uruguay’s March of Silence from social networks during the COVID-19 pandemic.
Urgent Consideration Law Compared
Built a website using open data so citizens could compare article-by-article changes in Uruguay’s Urgent Consideration Law. The site reached 1.4M visits in six months.
Skills
Search, recommendations, and applied AI: OpenSearch, Elasticsearch, semantic search, full-text search, vector search, pgvector, ranking/reranking, confidence scoring, recall analysis, feedback classification, OpenAI API, Gemini API, Claude, human-in-the-loop workflows.
Data platforms and orchestration: PostgreSQL, MySQL, MongoDB, Redshift, DynamoDB, S3, dbt, AWS Lambda, SQS, Celery, Redis, Kafka, Luigi, ETL, data modeling, document ingestion, task routing, priority queues.
Programming and backend systems: Python, TypeScript, Java, Django, FastAPI, Flask, NestJS, React, REST APIs, domain-driven design.
Machine learning and NLP: Scikit-learn, spaCy, PyTorch, TensorFlow, Hugging Face Transformers, SageMaker, embeddings, document classification, information extraction, summarization, parsing.
Infrastructure and observability: Linux, Docker, Kubernetes, Docker Swarm, AWS, ECS, OpenShift, DigitalOcean, CI/CD, GitHub Actions, CloudWatch, Datadog, Grafana, QuickSight, Looker Studio, Streamlit, Langfuse, GitLab.
Participation and collaboration platforms: Discourse, Decidim.
Education
University of the Republic, Faculty of Engineering · Montevideo, Uruguay
- Master in Data Science and Machine Learning · Paused
- Computer Engineering · 2017
- Computer Analyst · 2015
Cambridge University / Anglo Uruguayo · Paysandú, Uruguay
- First Certificate in English · 2003
Courses
- Media, Politics and Polarization · FIC Permanent Education · 2020
- SIGES Workshop · IM Training and Studies Center · 2020
- Deep Learning Specialization · Coursera / DeepLearning.AI · 2018
- Machine Learning · Coursera / Stanford · 2018
- Literary Motivation Workshop · Onetto Literary Workshops · 2016, 2017, 2020
Presentations
- Analysis of e-participation tools · Virtual Seminar on Open Government, Agesic · September 2020
- Extraction of events in a city from social networks · SIMBig 2018, Universidad del Pacífico, Lima, Peru · September 2018
- Analytical Intelligence Conference · Hotel Crystal Tower · December 2017
- Sample Engineering · Faculty of Engineering · October 2017 · Second prize awarded by the Institute of Computing
- 3rd Meeting of Smart Cities for Inclusion · Municipality of Montevideo · August 2017
Publications
- Martínez Puga, M., & Speroni, R. (2026). El ejercicio de despolarizar la discusión pública. Informatio, 31(1), e207. https://doi.org/10.35643/Info.31.1.12
- Steglich, M., Speroni, R., & Prada, J. (2019). Twitter Event Detection in a City. 5th International Conference, SIMBig 2018, Lima, Peru. https://doi.org/10.1007/978-3-030-11680-4_5