Niko Hartline

Data Mining of Global Endangered Species Data

Description

Nature article featuring the mined data: https://www.nature.com/nature/journal/v546/n7656/full/nature22900.html

The aim of the project was to determine the global connection between threatened species and agriculture, proposing a possible solution in the closing of "yield gaps" or suboptimal crop yields typically in countries without the capital to invest in better agricultural technology.

To obtain the data on endangered species in each country, the project was scoped to include a team manually converting the data from the International Union for Conservation of Nature (IUCN) Red List of Endangered Species on mammals and birds. When I was brought onto the project, I designed a data mining program and collaborated with a software developer to create scripts in Python to extract the raw data automatically from the Red List. This eliminated the need for over 300 hours of manual data entry and expanded the project to include amphibians, reptiles, and plants in addition to the originally scoped animal groups.

The data were then organized with R and used to create models in JMP exploring linkages between species endangerment and agricultural expansion to predict the number of species that might be saved by closing the yield gap to address increasing food demand.

Click to view my GitHub page with code for data mining, organizing, and visualizing the IUCN Red List of Endangered Species

Technology