The concepts that I am referring to as blueprinting and prototype are hardly deserving of such fancy titles. And what I am about to write hardly deserves it’s own post. Yet these incredibly simple, yet powerful, concepts deserve any coder’s attention.
Before you write any code, spend some time describing a blueprint of what you want your code to achieve. Code can be hard to grasp, especially densely written logic. Keeping track of nested loops, variables, and interim tables will test the RAM of even the most experienced of coders! It is easy to feel lost even when reviewing code you have written yourself, especially if returning to it after a break.
I recommend writing code in a sequence of scripts (or discrete blocks with clearly demarcated functionality), and the first in that sequence is a plain-english description of what each script is intending to accomplish. The simple exercise of writing down the functionality and it’s connection to the overall purpose of the predictive modeling exercise not only helps improve documentation, it can help write organized, streamlined, better – code.
Especially when dealing with large data sets, it can be extremely inefficient to write & test code on the target data. Try to find a clean point-of-entry of data into your flow of operations, and then write + test your code simultaneously by dripping small amounts of target data through. For example, use only the top (or random) one-thousand rows of data, rather than millions, in order to complete your code development. Prototype your solution with small amounts of data, and once satisfied, run the whole thing through it.
Yep, told you the ideas don’t look like they deserve calling out in so many words. However, many that are new to coding especially in data-rich environments tend to overlook these. I promise you, these two simple concepts can save you hours if not days of re-work!
Go back to Volume 1: Principles.