ArXiv Preprint
Many scientific fields -- including biology, health, education, and the
social sciences -- use machine learning (ML) to help them analyze data at an
unprecedented scale. However, ML researchers who develop advanced methods
rarely provide detailed tutorials showing how to apply these methods. Existing
tutorials are often costly to participants, presume extensive programming
knowledge, and are not tailored to specific application fields. In an attempt
to democratize ML methods, we organized a year-long, free, online tutorial
series targeted at teaching advanced natural language processing (NLP) methods
to computational social science (CSS) scholars. Two organizers worked with
fifteen subject matter experts to develop one-hour presentations with hands-on
Python code for a range of ML methods and use cases, from data pre-processing
to analyzing temporal variation of language change. Although live participation
was more limited than expected, a comparison of pre- and post-tutorial surveys
showed an increase in participants' perceived knowledge of almost one point on
a 7-point Likert scale. Furthermore, participants asked thoughtful questions
during tutorials and engaged readily with tutorial content afterwards, as
demonstrated by 10K~total views of posted tutorial recordings. In this report,
we summarize our organizational efforts and distill five principles for
democratizing ML+X tutorials. We hope future organizers improve upon these
principles and continue to lower barriers to developing ML skills for
researchers of all fields.
Ian Stewart, Katherine Keith
2022-11-29