Kevin Hillstrom: MineThatData

Exploring How Customers Interact With Advertising, Products, Brands, and Channels, using Multichannel Forensics.

February 09, 2010

Digital Profiles: Creating Each Profile

You have your data in a "spreadsheet", if you will, one row per customer, each column telling us something about that customer during the past year.

Now, it is time to generate each Digital Profile.

You are free to use whatever methodology you wish to use. I personally adore a methodology known as a "Factor Analysis", because the methodology is elegant, reducing the dimensionality of a complex dataset to a series of "factors".

Here's what I do:
  1. I calculate the mean and standard deviation of each variable, for later use.
  2. I run a Factor Analysis.
  3. I extract three or four "Factors".
  4. I run a frequency distribution, to determine the median value for each Factor.
  5. I re-code each Factor, 0 = below 50th percentile, 1 = above 50th percentile.
  6. I score the customer file at multiple points in time, so that I know the "Digital Profile" of the customer at many different times. This information is saved for analysis purposes.

Personally, like using sixteen Digital Profiles (four factors split into two groups each).

Enough for today. Next week, we'll dive into naming strategies, analysis and reporting, and Multichannel Forensics / Online Marketing Simulation examples.

Labels:

4 Comments:

At 6:26 AM , Anonymous Brian said...

Are you using dummy variables for things like

"The merchandise divisions the customer purchased from (think about the tabs running across the top of your website ... yes, this matters, too!)"

and

Day of the week they purchased?


Factor analysis requires continuous variables as far as I know...

 
At 6:44 AM , Blogger Kevin said...

More than anything, you want to avoid creating linear combinations of variables --- i.e. dummy variables that are combinations of other dummy variables ... Dec = 1 - Jan - Feb - Mar - Apr - May - Jun - Jul - Aug - Sep - Oct - Nov - Dec.

I've used Dummy Variables, I've used % of spend, and I've used $s spent. I've found that combinations of variables work best, from a practical standpoint, meaning that some variables are continuous, some are dummy variables.

 
At 7:17 AM , Anonymous Brian said...

Thanks. I think most literature says avoid mixing, but if it works and provides a meaningful interpretation - great!

One alternative might be to build a correlation matrix for variables that is "pieced together" using correlations suitable for binary x binary and binary x continuous etc. Then use a factor program like SAS Proc Factor that allows the input of a correlation matrix instead of raw data

 
At 7:19 AM , Blogger Kevin said...

Sure, you can certainly do that.

Thank you very much for offering your thoughts. I'm confident our readers will benefit from your suggestions!

 

Post a Comment

Links to this post:

Create a Link

<< Home