Kevin Hillstrom: MineThatData

Exploring How Customers Interact With Advertising, Products, Brands, and Channels, using Multichannel Forensics.

April 19, 2009

49 Vital Multichannel Modeling Tips For Analysts And Statisticians

Here are 49 Multichannel Modeling Tips, for those of you who know how to interpret a "Wald" Statistic.

Tip #1: More business insight can be gained by modeling response over the course of a year than can be gained by modeling response for a campaign.

Tip #2: If you are modeling more than 100,000 customers, t-statistics are useless in Ordinary Least Squares Regression. You'll probably need a t-score of 10 or 25 to detect true significance. And at 1,000,000 customers, this measure increases to 50 or 100.

Tip #3: For many businesses, seasonality matters. Create variables to detect seasonality.

Tip #4: Recency usually isn't a linear relationship. Often, the square root transformation works well for recency.

Tip #5: If you are modeling just the twelve month file, try using dummy variables for each month of recency. This allows you to detect the right transformation to use if recency is a continuous variable, and can help you detect seasonality.

Tip #6: Dummy variables are beautiful!

Tip #7: Parsimony matters. A model that has four variables is usually better than a model that has four-hundred variables.

Tip #8: Models with four variables are boring. Build your model with four-hundred variables, so that you can gain as much business insight as possible. Then implement the model with four variables.

Tip #9: Decide whether you will specialize in business insight, or mathematical brilliance. Don't focus on both. Your career path depends upon which path you choose.

Tip #10: Business leaders are yearning for business insight, the kind gleaned from statistical models.

Tip #11: Business leaders hate geeky math ... unless, of course, the business leader is a lover of geeky math.

Tip #12: In multichannel modeling, recency, frequency, and monetary information capture the vast majority of variability.

Tip #13: If frequency and monetary value are highly correlated, use frequency in your Logistic Regression models, and use average order value in your Ordinary Least Squares spending models.

Tip #14: Some people are able to get away with violating about 22,483 assumptions, combining response and spend models into one model analyzed with Ordinary Least Squares regression.

Tip #15: Residual analysis across 295,483 customers is a challenge! Summarize across individual values in your variables, and then conduct your residual analysis.

Tip #16: Enter into a drawing for a free copy of my "Hillstrom's Database Marketing" book if you know how to apply the "Durbin-Watson" statistic to multichannel modeling issues.

Tip #17: Annual response/spend models should be created for each important channel you analyze.

Tip #18: Consider a dummy variable for each micro-channel you analyze.

Tip #19: Use separate dummy variables for each important combination of paid search engine (Google, Yahoo!, MSN), branded/non-branded, and any important keywords. There is a richness of business intelligence to be understood from these variables!

Tip #20: If you have an Amazon store, be sure to have a dummy variable that captures the impact of Amazon purchases on your business.

Tip #21: Consider using dummy variables that capture the impact of geography on your business --- urban, suburban, and rural variables do make a difference.

Tip #22: If your business generates sales from stores, use dummy variables to capture the impact of being 0-4, 5-9, 11-25, 26-50, and 51+ miles from a store.

Tip #23: Consider building completely different models for estimating who will purchase during the holiday shopping season.

Tip #24: Web Analytics click data ages QUICKLY. Catalog purchase data ages SLOWLY. Be sure to capture the impact of each dynamic in your models.

Tip #25: Model the recency of clicks in your e-mail campaigns. But remember, e-mail campaign purchases are much more important than e-mail campaign clicks.

Tip #26: Include dummy variables that record the "source of acquisition" for each customer. Customers acquired from many sources have minimal future value.

Tip #27: Include dummy variables that record "when" customers were acquired. A customer acquired twenty years ago spending $100 in 2008 is worth more than a customer acquired in 2006, spending $100 in 2008.

Tip #28: If you are a Web Analyst, seriously consider making the transition to statistical analysis.

Tip #29: More variables matter when predicting response than when predicting spend. Often, spend can be well predicted by frequency and average order value.

Tip #30: Throw out many of the traditional "rules" that statisticians require when modeling hundreds of thousands or millions of customers. Modeling customer behavior is more of an art than a science.

Tip #31: Get yourself a REALLY FAST rocket-ship of a computer! At one company, I literally offered to buy my own hardware, when the finance department refused my capital request for new equipment. One month later, I resigned.

Tip #32: Use software that you are comfortable with. That might be "R", SPSS, SAS, Statistica, Excel/SQL, whatever, use what you like to use.

Tip #33: If the business conditions that existed when you built your model no longer exist, BE CAREFUL!

Tip #34: When working with different micro-channels (Twitter, Facebook), use age-based dummy variables if possible. In other words, create dummy variables for 18-29 year olds, 30-39 year olds, you get the picture. Go ahead and use any age range you wish.

Tip #35: Very interesting findings await the multichannel statistician who runs a Factor Analysis.

Tip #36: Business leaders intuitively know what an "A" customer is, compared with a "B", "C", "D" or "F" customer. So instead of reporting on different predictions, simply grade your customers, and speak to business leaders in terms of grades.

Tip #37: If you are analyzing categorical data, give Correspondence Analysis a try --- one of my personal favorites!

Tip #38: Never run a Neural Network model and then try to explain the results to a person who does not know how to run models but thinks s/he knows how to interpret regression models.

Tip #39: Use dummy variables for each store your business generates sales from.

Tip #40: Interaction terms MATTER! This is where you learn that "Multichannel Customers Are Not Always The Best Customers"!

Tip #41: Create dummy variables that evaluate every employee who answers customer questions via Live Chat. Do the same thing for Telephone Reps, and Store Employees. You'll learn who your most valuable employees are!

Tip #42: Create dummy variables for the blogs responsible for sending you the most traffic.

Tip #43: MERCHANDISE MATTERS! Include variables that feature the merchandise divisions or departments that customers purchase from.

Tip #44: Actively model future return rates, and then stop marketing to customers who are predicted to return too much merchandise.

Tip #45: A customer who purchased five $50 items behaves different from customers who purchase one $250 item.

Tip #46: The best projects are projects you initiate. Always do what management asks you to do. But keep Friday afternoons free, if you can, to run your own models, to investigate your own ideas.

Tip #47: Never quit. Do not listen to folks who mock you, who call you a "geek", who demean your math in favor of their gut opinions.

Tip #48: Always have a validation sample to validate your results against. In other words, build your model against half of the data (or 70% or whatever you think is the right percentage), and then rank-order the holdout sample to make sure you did a good job.

Tip #49: Some of the best models are built against mail / holdout groups. In other words, you build a model for folks mailed an e-mail or catalog --- and you build a model for folks held out (an equally random sample of customers). Subtract the predicted values of the mailed and holdout models for the "inbcremental" value of your campaign. But be careful! This style of analysis is subject to big problems if the models have unstable or non-linear coefficients.

Labels: