Technical Debt: The achilles heel of data science

Chip Oglesby bio photo By Chip Oglesby

Imagine this:

It’s six o’clock and you’re still at work. There is still some work to be done on a project and you just want to be done already. There’s a hard way to do something but there’s also the right way to do it, which would be even harder.

Which do you choose?

The biggest obstacle that I see in most projects in the real-world are when people have chosen the quick and easy way to do something right now, instead of the best solution.

This is what we call technical debt and it’s the one thing that will kill your work and your projects every single day.

You may have fixed the solution in the short-term, but you’ll end up spending more and more time on it in the long run because you were too impatient to plan it right from the beginning.

How can we overcome technical debt?

To begin with, you may consider how you could think computationally.

You would begin by thinking about how you might tell a computer or someone who knows nothing about what you do how to do a task. All tasks have steps that form an algorithm that make the task repeatable and reproducible.

How does this apply to data science?

If you consider yourself a “data scientist” but not a “programmer” you may have a workflow like this:

  1. A c-level executive requests a report on something that they’re remotely interested in.
  2. You fire up your database of choice, run a query and download data.
  3. You import data into excel, create some basic formulas and save your work.
  4. You then type up your findings in Microsoft Word and and send off your report.

That’s if you’re lucky and didn’t make any mistakes and the person requesting the work doesn’t ask any follow up questions.

I use this as an example because it raises some interesting questions:

  1. Is this the best way for me to do this?
  2. Is this a report that could be on-going and needed more than once?
  3. Is there a better way that I could get data, analyze and create reports/charts?
  4. If I were unable to come to work, would someone be able to analyze what I’ve done and continue my work?

If you answered no to any of these questions, you probably have some technical debt to overcome. The same goes for people who are actually programmers who know better but still take shortcuts to get tasks done on-time or under budget.

If this interests you, there’s a Coursera class you can take.

But this task is really urgent

“We do this because we always fight the ‘tyranny of the urgent’”. I hear this a lot. I would argue that there are few things in most businesses that are actually urgent. If Google goes down or a website goes offline, that’s urgent, but just because someone is hot on a task, doesn’t mean that it has to be urgent to you.

Summary

  1. The quick and easy solution works in the short-term but not in the long term.
  2. If you can learn to think computationally, you can learn how to be better at your job.
  3. Very few things you actually do at work are urgent. Spend more time preparing for tasks than just simply doing something.

Further Reading

I’ve spent some time thinking about this before and you can read more here:

  1. Why I’m Fascinated With Reproducible Code
  2. A Reproducible Research Template for R
  3. Programming for data analysts
  4. What does learning how to code actually teach us?
  5. The 500 folder problem: A Real World Automation Example
  6. How I became enamored with automation
  7. Automating reports in R and version control with a shell script