Turing data study group – April 2018

5 months into our PhD, we (Keiran and Noelyn) applied and got accepted to attend the April Data Study group at the Alan Turing Institute in London. Data Study groups are intensive five-day collaborative hackathons, where data scientists of all levels are brought together to solve interesting real-world data problems submitted by Challenge Owners. Challenge Owners typically come from diverse backgrounds, e.g. industry, government, academia and the third sector, providing participants with the opportunity to work on a wide range of problems that they wouldn’t encounter in their day-to-day work. It takes place at the Alan Turing Building in London, located at the iconic British Library.

 

Unlike more traditional application processes that focus on CVs and cover letters, the application process for the Data Study group focuses more on participants showing off their technical skills (as well as their ability to collaborate and communicate) by sharing a portfolio of work that illustrates their strengths. Dr Kirstie Whitaker, a Turing Research Fellow, shared her thoughts (https://www.turing.ac.uk/blog/how-write-great-data-study-group-application) on what she looks out for when assessing an application. Noelyn shared a google drive link which contained her MSc footballer’s value prediction script and report, and a script for a time series prediction model, both written in Python. without prior work experience as a data scientist, her application highlighted recently gained coding skills and zeal to apply them, ability to be a team player, and desire to learn during the process. Keiran, on the other hand, sought to demonstrate his coding and teamwork skills by drawing on the experience of working in industry as part of a development team.

 

Once accepted, Noelyn’s greatest hindrance to attending was childcare provisions; however, the organisers were very accommodating suggesting she brought her kids along and offering to provide accommodation that would fit. Although she ultimately made other arrangements, this alone cemented her desire to be there and highlighted their agenda of inclusivity.

 

The five days

Keiran was fortunate enough to be provided with accommodation  in university halls just 5 minutes walk from the Turing Institute, making for an easy commute to the venue. This was particularly valuable given that the programme really is what it says on the tin (‘intensive five-day collaborative hackathon’), starting at 9am on the Monday (most participants arrived on the Sunday) and finishing at 4pm Friday. In-between, participants work up until 9pm, sometimes 10pm. This is made much more tolerable by the breakfasts, lunches and dinners provided, as well as  an array of snacks, iPad powered coffee and fridge full of fizzy drinks.

 

The first day included registration, a briefing from the organisers and introduction of the challenges by the owners, an icebreaker, group assignment and after lunch group work begins. Starting group work on the first day, gives participants an opportunity to meet other group members and scope working solutions. This is also an important opportunity to rethink group membership (your suitability), which is what Noelyn  had done and by the next day, joined another group after speaking with organisers.

 

The 2nd, 3rd and 4th day were really straight into the deep end. The end result is not meant to be a fully functioning solution, instead it would be a collation of several ways to tackle the problem which the company can take forward and improve on. This meant that we, the participants, were not restricted and thus given the opportunity to use our expertise while working with team members to ensure that typical data exploration and pre-processing steps were undertaken. To ensure cohesive working and non-duplication of work, each team had a facilitator who worked as the ‘project manager’. Here (https://www.turing.ac.uk/blog/data-study-group-researchers-perspective),  Chanuki Seresinhe, a visiting researcher talks about her role as a facilitator.

 

We ended up in the same group, working on a large dataset of training and user records provided by eGym (a company that develops and manufactures advanced products for the fitness market) along with other researchers from a range of backgrounds as well as the project owner. Given the nature of the Data Study Group, we were allowed free rein over the direction in which we took our investigation. This culminated in members of the team splitting off into smaller groups to work on subproblems. The two of us ended up working together, focusing on clustering and segmenting gym users based on their characteristics. This work could then be used to specialise later modelling processes which aimed to estimate the performance gym-goers based on their information and previous performances. Working collaboratively on this project was made possible through the Turing Institute’s cloud virtual machine system, using slack for communicating within and across teams, the use of overleaf for report writing and using Git for code repository.

 

Although the days may have been long, time was made for socialising in the evening, with a trip to the Namco Funscape arcade allowing the groups to bond as teams.

 

Whilst each of us became progressively more fixated on our respective corners of the group project, regular catch-up sessions were organised throughout each of the days by our facilitator ensuring that we were all aware of each other’s work and how it might relate to our own, and keeping spirits up when things got tough. Beyond this, he ensured that we each documented our contributions such that by the end of Thursday, we had a cohesive report and presentation which we proudly presented to the other participants, challenge owners and academics on Friday morning.

 

Final presentations were followed by lunch and a well-earned trip to the pub where we were free to let our hair down and pat ourselves on the backs for a frantic (but fun) week of work.

CDT students show excellence across the board in 2019 Partner Event

 

The 3rd year student organisers of the second CDT partner event are happy to report on a successful day. The event was hosted again by the Leeds Institute for Data Analytics (LIDA) and while the essence of the day was similar to last year, this year saw the addition of mini masterclasses to the bill. The academic staff and student attendees from the four CDT institutions were again joined by representatives from partner organisations and got a glimpse of the work being undertaken by students.

Professor Mark Birkin, LIDA Director, opened proceedings with a short welcome speech before handing over to University of Liverpool’s Professor Alex Singleton who chaired the day’s first session of individual lightning talks by 3rd year students. These replaced last year’s group presentations and provided a 3 – 4-minute snapshot of the research conducted for master’s dissertations or first papers. These bite sized presentations highlighted the range of research areas and skills being used, from assessing the impact of the weather on high street retail to examining inequalities in cycling participation. Students took questions after their talks and thanks to Dr Mark Taylor from the University of Sheffield asking us for our take home message, we all now have an elevator pitch of our work.

This year’s poster session was taken on by the 2nd year students with feedback highlighting their excellent knowledge and enthusiasm for their work. Examples of work completed for core modules could be seen in posters detailing the use of web scraping and text analysis. (You can view the event posters here!) Again, the variety of topics and analysis methods on show highlights the diverse range of projects undertaken. There truly is something for everybody on the CDT.

The partner event was timed to coincide with the Introduction to Programming module, the first of the MSc, and we were joined at lunch by the new cohort of students. You hit the ground running with this module, particularly if you’re new to coding, so this year’s lunch was extended to include a Q&A session hosted by Dr Eleri Pound (Centre Manager) to cover any questions or queries the new students had. A range of questions were submitted and 2nd and 3rd year students were able to pass on words of advice, encouragement and to hopefully alleviate any concerns.

The day’s final session was the mini masterclass. There was the option to sign up to one of three masterclasses; academic publishing, networking or public engagement. Feedback shows that attendees of each of the classes found them helpful and informative. The event overall was enjoyed by everybody, with comments showing that people found it interesting and engaging. Remember to follow our twitter page, @DataCDT, to stay in touch with CDT students as we continue to work at the cutting edge of our subject areas.

Written by Melanie Green, Noelynn Onah & Rhiannon Thomas.

CDAS first Annual Partner Event

On the 18th September, the Centre for Data Analytics and Society held its first annual partner event at the Leeds Institute for Data Analytics (LIDA). Attended by academics from across the CDT institutions and representatives from partner organisations, the event proved a great opportunity for networking and for the students to share what’s been keeping them so busy in their first year.

The event was opened by LIDA Director, Professor Mark Birkin, who was key to the establishment of the CDT. It was then over to the students from each of the CDT institutions at the Universities of Leeds, Manchester, Liverpool and Sheffield. The students gave group presentations showcasing their learnings from the MSc modules, experiences of working with partners during internship projects, and how they’d already started applying their new data skills to their PhD topics. With research areas ranging from health to crime, transport, retail and more, the students displayed a broad use of data science techniques such as clustering and text analysis, including some ‘just for fun’ projects like Keiran’s analysis of Pokémon characteristics. The presentations gave a real flavour of the interdisciplinary nature of the CDT and a clear sense of collegiality was on show.

Marking the completion of the first year for the Data Analytics and Society CDT, the event also provided an opportunity for feedback and discussion from student, academic and partner perspectives. We’re excited that the ideas raised during the event have led to the launch of our new @DataCDT twitter page and the set-up of thematic interest groups to promote collaboration and knowledge-share across the institutions. We feel that this is especially important now that the CDT has grown in number, having recently welcomed a brand-new cohort of first year students.

Having completed day two of the Introduction to Programming module in Python, the new student cohort later joined the event for an informal poster and networking session. This was a chance to view academic posters prepared by each of the current students and to ask questions about their work and experiences so far, which seemed to fuel excitement and settle nerves in equal measure among the new students.

As the CDT enters its second year, we’re excited to work with new academics and partners and to see ongoing projects progress. A number of our students have already been getting out to share their preliminary research findings at conferences nationally and overseas. So, watch this space and follow our twitter page to stay in touch with our CDT students as they continue to work at the cutting edge of their subject areas.

Vicki Jenneson

Presentations from the students are below and posters from this event can be found on this page – https://datacdt.org/meet-the-students/student-posters2018/

CDAS Leeds presentation

CDAS Sheffield presentation

CDAS Manchester Presentation

CDAS Liverpool presentation