A model in the context of *predictive modeling *is simply some useful mathematical abstraction of reality. A model should be (a lot) less complex than the reality it is intended to represent. This might seem obvious, but with ever increasing computation at our finger tips we need to carefully assess the relative complexity of our model and its subject reality.

The construction of a model starts with a series of assumptions and an objective. The objective could be understanding a real-world process better, explaining what happened in the past or to make a prediction about the future. The assumptions include the set of data that we think influences the outcome, and what sort of relationship (e.g. linear) does that data have with our objective. Those are the big ones anyway – then there are a whole bunch of little ones along the way. It is important to document assumptions and think of ways to test them as you are building your model.

Another thing that is often times overlooked is that a model is not simply a mathematical formula. Unless your audience has a 100% of the context you hold in your head (rare!) – a model includes the mathematical bits (e.g. code, excel, wherever else the math resides) but also the documentation including the assumptions. Further, it is always nice to have an executive summary/overview to make it easier for your audience to consume your work.