December 5, 2018
Day 13: Nick Mortimer, CSIRO
Data arrives from all kinds of sources on a modern research vessel. After nearly two weeks at sea we have collected more than 40,000 images of the seafloor. Each day as an engineer and data scientist I try not to be overwhelmed by the flow of data. Data velocity is key to any large scale research, we have a finite amount of time and lots of data that needs to be processed to deliver scientific outcomes. My goal is to increase data velocity. Velocity has two components: speed and direction. Onboard Investigator, I concentrate on speeding up the flow of data towards the scientific outcomes needed for successful research.
It’s a job that I could never have imagined doing as a young boy. In those days, I loved pressing flowers, knitting and long division. Over time, that developed into a passion for electronics programming and building things. Forty years after getting my first 8-bit computer with 4k memory, I’m on a ship with high-speed internet, a compute cluster with over 240 processors with 600Gb of memory, watching high definition pictures combined with stereo still images coming from the seafloor more than one thousand metres below me. The pictures are coming back to the ship in real time over a fibre optic cable. This technology allows us to know exactly where images are taken, despite having nearly 2000 metres of cable out behind the ship.
After every deployment of the camera system I run my python scripts to check the data, renaming and organising the data feeds checking quality, extracting and analysing information from the images. Meanwhile Chris, a colleague back on shore in Hobart, logs into the ship and fires up a docker container on our cluster to ingest the imagery into our video annotation system. Candice and Mark, who head up our imaging team onboard, use the calibrated stereo images to draw accurately measured quadrats on images for assessment of deep-sea coral cover.
Although we work 12 hours shifts at sea, the prolonged separation from family and friends often makes you reflect on who you are and what you do. The term data science was first coined in 2008, since then it has come to the fore in popular culture. Sometimes I ask myself: am I a data scientist, an engineer, or a programmer? Earlier this year I was at a data science conference in Boston and met Drew Conway, famous for his venn diagram of data science. He listed three components: hacking skills, math and statistics, and substantive expertise. Working at CSIRO where you are surrounded by internationally acknowledged experts in almost every field, it can be difficult to get an idea of the levels required for each of these skills. I have great hacking skills, an ability to get things working, nearly 20 years of writing code to analyse scientific data for oceanographic research, and an ongoing love for long division: so I guess I could be a data scientist!