Problem Statement

The researchers and academics of UCL require an in-depth breakdown of other researchers who are working within the same area; however, this is a long and arduous process that currently requires a manual search and verification process, one that can take months to complete and would make research much more effective if sped up. This is becoming a larger issue as there are more researchers every year, and this process must be completed annually which causes an increasing problem every time this process is redone.


Our Solution

Our solution is to automate this process for all areas of research by allowing researchers to search for all other academics using various filters using scraping tools across UCL related websites before cleaning the data and processing it into various forms of visualisations for the researchers to use. This process would be automatically updated each year to include new publications and researchers in order to constantly shorten the time period.


Achievement & Impact

We have created an easy to use web app interface that allows any user to search for any module or publication based on a keyword search, giving them the ability to find fellow researchers and other essential information about their field of study easily. We have further implemented a module to SDG mapping feature that ensures a high and consistent accuracy when matching any module to relevant SDGs with useful visualisations of data. There is also a proof of concept tool that maps IHE research expertise areas to provided key topic terms, a feature that can be easily expanded to automatically include researchers and their respective fields. We believe that this tool overall makes it very easy for researchers to find both publications and fellow researchers, overall decreasing preparation time and increasing efficiency.


Meet the Team!

Varun W.

(View LinkedIn)


Database Deployer,
Researcher, Team Liaison

Kareem K.

(View on LinkedIn)


Back End Developer, NLP / Machine Learning Developer, Researcher

Albert M.

(View on LinkedIn)


Project Website Manager, Back End Developer, Front End Developer

What it does:

Scrape, map and generate classifiers with the intention of generating an overview of the extent of activity already taking place at an organization

Project Background & Client Introduction

Our project came about due to a growing need within UCL, centred around the research facility in which their research administration, namely finding various things based on their topic of research such as names and other key terms. This search was taking longer and longer each year due to the increasing number of articles, papers and other publications on each topic coming out each year. This got us connected to various researchers varying from professors, PhD research students and sustainable development researchers. They all required a method to speed up this process and to help get more accurate data automatically rather than repeating this each year manually.



You may find the client's details below:

  • Neel Desai - neel.desai.13@ucl.ac.uk

  • Marilyn Aviles - marilyn.aviles@ucl.ac.uk

  • Prof Ann Blandford - ann.blandford@ucl.ac.uk

  • Dr. Simon Knowles - s.knowles@ucl.ac.uk



Project Goals

Our main project goals include trying to achieve a way for all researchers to be able to find and contact other researchers that are working on or have worked on the same field of study as them. The project goals also include trying to ensure that the RPS database can be scraped in order to find all papers that are linked to the search field used that can then be used to find all related researchers.



Requirement Gathering

We gathered various requirement in the form of a MOSCOW list that was created through a series of group and individual meetings with each of our clients in order to gather all the features and methods that are needed as well as the priority of each of these. We went over these features multiple times with our clients and added a few as well as splitting up other requirements into different parts.



Personas


Persona 2
Persona 1


Use Cases

Alison has trouble identifying researchers to collaborate with across the IHE. She would like to use this tool as a quick and efficient way of searching for researchers across different engineering fields. She wants to establish connections across UCL and monitor the progress of her colleagues.

Jonathan is a PhD researcher who would use this data tool in order to quickly find and sort through different research topics that have or currently are conducted at his university. He aims to gain insight into the extent to which UCL is involved in promotion of the 2015 UN’s Sustainable Development Goals in both teaching and research.



Functional MoSCoW List

ID Requirement Description Priority
1 Scrape UCL research publications from Scopus by leveraging the Scopus API to gather the following data: {title, abstract, DOI, subject areas, index keywords, author keywords, elsevier link, …}. Must
2 Scrape UCL modules from the UCL module catalogue by leveraging the UCL API to acquire the following data: {description, title, ID, module lead, credit value, …}. Must
3 Produce an extensive set of keywords (CSV file) for UN SDGs (United Nations Sustainable Development Goals) and IHE (Institute of Healthcare Engineering) topics. Must
4 Use NLP to preprocess text for UCL module fields: {description, name} and Scopus research publication fields: {title, abstract, index keywords, author keywords}. Must
5 Train a semi-supervised NLP model to map UCL module catalogue descriptions to UN SDGs (United Nations Sustainable Development Goals). Must
6 Train a semi-supervised NLP model to map Scopus research publications to IHE (Institute of Healthcare Engineering) research specialities and subject areas. Must
7 Providing the most up-to-date data on Scopus research publications and UCL course modules. Must
8 Django web application implemented and fully deployed. Must
9 Django keyword search functionality for Scopus research publications. Must
10 Train a supervised machine learning model, such as an SVM (Support Vector Machine), aimed at reducing the number of false positives from the NLP model. Should
11 Use the NLP model, trained on SDG-specific keywords, to make SDG mapping predictions for UCL research publications from Scopus. Should
12 Validate the NLP model using a string matching algorithm to count SDG-specific keyword occurrences and compare the probability distribution with that produced by the NLP model, trained on the same set of SDG-specific keywords (represents a similarity index in the range [0,1]). Should
13 Machine Learning visualisation using TSNE Clustering (via dimensionality reduction) and Intertopic Distance Map (via multidimensional scaling) for SDG & IHE topic mappings, deployed on Django. Should
14 Visualisation of SDG results using Tableau (accessed through database credentials). Should be able to view SDG sizes in accordance to the number of students per module/department/faculty across UCL. Should
15 Walkthrough guide describing how to use the final product & system maintenance (for rerunning the scraping and retraining the NLP model on an annual basis). Should
16 Django keyword search functionality for UCL modules, from the module catalogue (bonus feature). Could
17 Django logic for visualising validation using similarity index, mapping from values in the range [0,1] to a red-green colour gradient [red, green]. Could
18 Django button for exporting Scopus keyword search results to a CSV file format (bonus feature for exporting UCL module keyword searches). Could
19 Django option for sorting rows of the NLP model results table, based on the validation similarity index (red-green colour gradient). Could


Non-Functional MoSCoW List

ID Requirement Description Priority
1 Will have a responsive Django web interface Must
2 Web interface will be reliable and publically available at all times Must
3 Product will be scalable for a constantly increasing number of modules and publications Must
4 Avoids any possible legal or licensing conflicts Must
5 Data integrity for publications and modules Must
6 Interface will be intuitive and easily usable Should
7 There will be a home button on a navigation bar for ease of browsing Should
8 Main thematic colour across the site will be #007FFF Should
9 Source code will be highly readable to any external user Could
10 Informative, multi-page visually pleasing user interface Would

Our prototypes

Prototype 2 developed using Miro

Contact Us