5 months into our PhD, we (Keiran and Noelyn) applied and got accepted to attend the April Data Study group at the Alan Turing Institute in London. Data Study groups are intensive five-day collaborative hackathons, where data scientists of all levels are brought together to solve interesting real-world data problems submitted by Challenge Owners. Challenge Owners typically come from diverse backgrounds, e.g. industry, government, academia and the third sector, providing participants with the opportunity to work on a wide range of problems that they wouldn’t encounter in their day-to-day work. It takes place at the Alan Turing Building in London, located at the iconic British Library.
Unlike more traditional application processes that focus on CVs and cover letters, the application process for the Data Study group focuses more on participants showing off their technical skills (as well as their ability to collaborate and communicate) by sharing a portfolio of work that illustrates their strengths. Dr Kirstie Whitaker, a Turing Research Fellow, shared her thoughts (https://www.turing.ac.uk/blog/how-write-great-data-study-group-application) on what she looks out for when assessing an application. Noelyn shared a google drive link which contained her MSc footballer’s value prediction script and report, and a script for a time series prediction model, both written in Python. without prior work experience as a data scientist, her application highlighted recently gained coding skills and zeal to apply them, ability to be a team player, and desire to learn during the process. Keiran, on the other hand, sought to demonstrate his coding and teamwork skills by drawing on the experience of working in industry as part of a development team.
Once accepted, Noelyn’s greatest hindrance to attending was childcare provisions; however, the organisers were very accommodating suggesting she brought her kids along and offering to provide accommodation that would fit. Although she ultimately made other arrangements, this alone cemented her desire to be there and highlighted their agenda of inclusivity.
The five days
Keiran was fortunate enough to be provided with accommodation in university halls just 5 minutes walk from the Turing Institute, making for an easy commute to the venue. This was particularly valuable given that the programme really is what it says on the tin (‘intensive five-day collaborative hackathon’), starting at 9am on the Monday (most participants arrived on the Sunday) and finishing at 4pm Friday. In-between, participants work up until 9pm, sometimes 10pm. This is made much more tolerable by the breakfasts, lunches and dinners provided, as well as an array of snacks, iPad powered coffee and fridge full of fizzy drinks.
The first day included registration, a briefing from the organisers and introduction of the challenges by the owners, an icebreaker, group assignment and after lunch group work begins. Starting group work on the first day, gives participants an opportunity to meet other group members and scope working solutions. This is also an important opportunity to rethink group membership (your suitability), which is what Noelyn had done and by the next day, joined another group after speaking with organisers.
The 2nd, 3rd and 4th day were really straight into the deep end. The end result is not meant to be a fully functioning solution, instead it would be a collation of several ways to tackle the problem which the company can take forward and improve on. This meant that we, the participants, were not restricted and thus given the opportunity to use our expertise while working with team members to ensure that typical data exploration and pre-processing steps were undertaken. To ensure cohesive working and non-duplication of work, each team had a facilitator who worked as the ‘project manager’. Here (https://www.turing.ac.uk/blog/data-study-group-researchers-perspective), Chanuki Seresinhe, a visiting researcher talks about her role as a facilitator.
We ended up in the same group, working on a large dataset of training and user records provided by eGym (a company that develops and manufactures advanced products for the fitness market) along with other researchers from a range of backgrounds as well as the project owner. Given the nature of the Data Study Group, we were allowed free rein over the direction in which we took our investigation. This culminated in members of the team splitting off into smaller groups to work on subproblems. The two of us ended up working together, focusing on clustering and segmenting gym users based on their characteristics. This work could then be used to specialise later modelling processes which aimed to estimate the performance gym-goers based on their information and previous performances. Working collaboratively on this project was made possible through the Turing Institute’s cloud virtual machine system, using slack for communicating within and across teams, the use of overleaf for report writing and using Git for code repository.
Although the days may have been long, time was made for socialising in the evening, with a trip to the Namco Funscape arcade allowing the groups to bond as teams.
Whilst each of us became progressively more fixated on our respective corners of the group project, regular catch-up sessions were organised throughout each of the days by our facilitator ensuring that we were all aware of each other’s work and how it might relate to our own, and keeping spirits up when things got tough. Beyond this, he ensured that we each documented our contributions such that by the end of Thursday, we had a cohesive report and presentation which we proudly presented to the other participants, challenge owners and academics on Friday morning.
Final presentations were followed by lunch and a well-earned trip to the pub where we were free to let our hair down and pat ourselves on the backs for a frantic (but fun) week of work.