This might seem like an incredibly dull topic, and in some respects it is! However, organizing your work in a sensible folder structure will supercharge your predictive modeling efforts! There are few things in advanced computational modeling work that are simultaneously so simple to do, yet have an supersized effect on the quality of your output.

So, what is a sensible folder structure? I know what it is for me for given my applications – but it likely will be different for you given your modeling environment, applications, and also personal preferences. Here are some of the tips to keep in mind:

  1. Be aware that some software don’t like spaces in folder names, avoid that if possible or check that your modeling software won’t mind reading/writing from a folder with spaces in the name.
  2. When doing similar workflows for different clients/problems – think hard and carefully about standardizing your folder structure for each of those workflows. This will be tremendously rewarding later when you are trying to figure out what you did 3 months ago for that modeling assignment.
  3. Think about the key pieces of the workflow. There is the data you start from – put it in it’s own folder. The work you do can go under “Work”, and then organized further (e.g. an “R” folder, and a separate “SQL” folder, etc. to keep scripting workflows separate). There can be a “Report” sub-folder that contains the output from your work that is geared towards your audience.
  4. Try very hard to avoid a cluttered look in your folders. The folders should not look busy, and should be intuitive to navigate. Do not keep multiple versions of the same file in the folder, move older versions of your files to an “Old” folder within the sub-folder that you are in. This can avoid confusion and lessen clutter.
  5. I find it useful to suffix files by YYYYMMDD. For example, “Report X  – 20191214.doc”. This makes it a lot easier to sort files, and to immediately see which version of the file is latest. The sort doesn’t work as well if you use a different date scheme.
  6. Avoid using combination of all upper-case or all lower-case, or mixed-case folder names, or the absolute worst – misspelled folder names! That just looks lazy! How can we trust someone’s advanced computational work when they are not careful with how they spell!

That’s it for now – did I miss something important? Let me know. I’ll take your suggestions and improve this post over time.


