Ciao! I am
Luca Coraggio
I am a CSEF Fellow and a Postdoctoral Researcher at the Department of Economics and Statistics, University of Naples Federico II (Italy). I hold a Ph.D. in Economics, and my main research interests are in Machine Learning and Statistics.
In these years I've been working on two connected lines of research. The first one is on methodological statistics, with a particular focus on model-based clustering and criteria for selecting optimal clustering solutions. The second line of research is devoted to application of supervised and unsupervised learning methods to general problems in Economics, using state-of-the-art statistical methods (involving: standard ML tools, deep learning, NLP, and computer vision) to exploit new sources of data, like images and text.
I enjoy coding my own solutions, and I am fluent in several programming languages. Here is my current top-three: C, Python, R.
-
Quadratic-scoring software
· w/ P. Coretto
Libraries and software for the Quadratic Scoring methods (QBH and QBS) from the JMVA paper. Roadmap: heavy-lifting in C; interfaces in Python and R (at least).
-
JAQ of All Trades: Job Mismatch, Firm Productivity and Managerial Quality. SSRN Working paper
· w/ M. Pagano, A. Scognamiglio, J. Tåg
Does the matching between workers and jobs help explain productivity differentials across firms? To address this question we develop a job-worker allocation quality measure (JAQ) by combining employer-employee administrative data with machine learning techniques. The proposed measure is positively and significantly associated with labor earnings over workers’ careers. At firm level, it features a robust positive correlation with firm productivity, and with managerial turnover leading to an improvement in the quality and experience of management. JAQ can be constructed for any employer-employee data including workers’ occupations, and used to explore the effect of corporate restructuring on workers’ allocation and careers.
-
Newspaper articles digitalization for database construction
· w/ M. Vasca, G. de Blasio, R. Nisticò
We aim at digitalizing an archive of scanned article from a well-known italian newspaper. Our goal is to construct a queryable database, to be later used for studies in economics. This project is still early stage and involves: computer vision for article extraction (done); NLP for information retrival and database construction (ongoing); regression analysis (future).
-
Folklore studies
· w/ G. Immordino, F. F. Russo
This is an early stage project, where we use NLP methods to analyze Folklore data. More details will come.
-
(2023) Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score. Journal of Multivariate Analysis
· w/ Pietro Coretto
Cluster analysis requires fixing the number of clusters and often many hyper-parameters. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant function and parameters describing clusters’ size, center and scatter. We develop two cluster-quality criteria that are consistent with groups generated from a class of elliptic–symmetric distributions. Using the bootstrap resampling of the proposed criteria, we propose a selection rule that allows choosing among many clustering solutions, eventually obtained from different methods. Extensive experimental analysis shows that the proposed methodology achieves a better overall performance compared to established alternatives from the literature.
-
(2021) Illicit drugs seizures in 2013–2018 and characteristics of the illicit market within the Neapolitan area. Forensic Science International
· w/ A. Silvestre, P. Basilicata, R. Guadagni, A. Simonelli, M. Pieri
The study presents results of toxicological analysis performed on seized material in Neapolitan area in the period from 2013 to 2018. A constancy in THC and heroin percentages is evidenced (%THC ~10% and ~11.5% for marijuana and hashish; heroine: 20–24%), with mean values exceeding the European data. Data on cocaine revealed a constant increment of active principle percentage over the studied period (from 40% in 2013 to ~65% in 2018), with peak of 70% in 2017; also, number of samples exceeding the mean value increased over years. Active principles contents resulted higher than the ones reported in other Italian area ever the same period; marijuana was prevalent on hashish, confirming an Italian trend different from other European countries. A map of the Campania region evidenced two main “storage” districts, one corresponding to the city center and the second located in the northern part. If compared with literature data on the presence of local mafia, these areas are perfectly superimposable to those with the highest risk of homicides, thus confirming the degree of radicalization of local organizations and the relative weight of proceeds from drugs sale. Moreover, such radicalization within the territory seems to be the main reason of the absence of new psychoactive substances among the seized material.
PythonLab
A short, introductory course in Python programming. Level: undergraduate, 6 lectures, ~2h/lecture. The course is modeled on the Python tutorial, with a tilt toward data analysis and economics. The material is in italian, and currently available on Moodle Unina (I plan to make it available on git), and it includes:
- Slide with programming concepts
- Exercises and solutions
- Scripts and advanced solved exercises
Tools for Data Analysis
MOOC course, hosted on Federica Web Learning, part of the Labor, Development & Policy evaluation program. Available here: link.
The course introduces elements of programming and statistical methods for data analysis and data science. It reviews and uses R, and Python programming languages as well as shell scripting, to interact with data (visualization, manipulation), automate tasks (web scraping, file management) and deploy machine learning methods. The course is hands-on: students get to work on mini-projects and practical exercises throughout the lectures; theory of the methods is touched upon and references for self-study are provided. The course is aimed at students willing to acquire programming skills to work with data (ideally, they have already taken statistics courses).