Kevin Hillstrom: MineThatData

Exploring How Customers Interact With Advertising, Products, Brands, and Channels, using Multichannel Forensics.

October 08, 2008

Catalog And Retailer Differences In Matchback Strategy And Contact Strategy Optimization

There's this huge shift in multichannel marketing strategy in recent years, with catalog matchback algorithms playing a significant role in the shift.

Fashion retailers (Neiman Marcus, Saks, Bloomingdales, Nordstrom) either eliminated traditional catalog marketing programs, or are in the process of significantly reducing circulation. Folks at Williams Sonoma are significantly trimming circulation.

When I talk to some of you, you tell me that these folks can cut circulation because they are retailers --- the retail channel somehow generates brand awareness that fuels a brand in a way that minimizes the need for advertising. You might be right, we simply cannot test your hypothesis.

Mechanically, retail brands are better at developing a testing discipline.

Here's an example. We randomly sample twenty customers, ten receive a catalog, ten do not, and measure performance across channels during the three weeks that a catalog is active. Here's what we observe:

Mailed

Holdout
Cust 1 Buy Store
Cust 11
Cust 2

Cust 12
Cust 3

Cust 13 Buy Online
Cust 4

Cust 14
Cust 5 Buy Phone
Cust 15
Cust 6 Buy Online
Cust 16
Cust 7

Cust 17
Cust 8

Cust 18
Cust 9 Buy Online
Cust 19
Cust 10

Cust 20 Buy Online

Here's the fundamental difference between the retailer and the catalog brand.

The retailer will compare the mailed group and the holdout group. In the mailed group, four out of ten customers responded --- in the holdout group, two out of ten customer responded. The retailer calculates response as (4 - 2) / 10 = 20%.

The cataloger does not execute the test. Instead, the cataloger takes the mailed group, identifies the four responses, matches the responses back to the mail file, and calculates response as 4 / 10 = 40%.

Again, notice the significant difference in response, using the two methodologies.
  • Retailer = 20% Response Rate.
  • Cataloger = 40% Response Rate.
In this comparison, the organic percentage is 20% / 40% = 50%. Half of the demand would happen without any advertising.

This fundamental difference in approach causes a shift in strategy.
  • Retailer = Cut Circulation, Re-Allcoate Marketing Dollars Elsewhere, Learn!!
  • Cataloger = Maintain Circulation, Ask For Additional Funding For Online Marketing, And Significantly Over-Spend In The Catalog Marketing Channel, Driving Down Profit.
This problem is systemic across the catalog industry. Matchback vendors aren't trying to rip you off, they simply aren't. But there isn't an incentice to create a "best practice" that accounts for the differences that retailers observe when executing contact strategy testing and what catalogers measure via matchback analytics.

A simple solution for catalogers is to execute a test similar to the one designed above. Do not tell the matchback vendor about the holdout group. Have the matchback vendor run the control group through the matchback algorithm, and see how many orders are allocated to the holdout group. Subtract the results of the holdout group from the results of the mailed group, and you have true incremental demand as illustrated in the retail example at the beginning of this post.


Hillstrom's Contact Strategy Optimization: A New E-Book.
Support independent publishing: buy this e-book on Lulu.

Labels: , , , , , ,

June 03, 2008

Great Moments In Database Marketing #1: Incremental Value

Our top rated Database Marketing moment takes us back to 1993 - 1994. Yeah, way back then, people were doing sophisticated work. Honestly!

Way back in the early 1990s at Lands' End, we had seven different business units that marketed to customers, either through standalone catalogs, or though pages added to catalogs.

As growth became more and more difficult (pay close attention online marketers ... your world is heading in this direction), management elected to mail targeted catalogs to targeted customer segments.

In other words, a Mens Tailored catalog concept was developed, with a half-dozen or more incremental catalogs mailed to customers who preferred Mens Tailored merchandise. A Home catalog concept was developed, with nine or more incremental catalogs mailed to customers who preferred Home merchandise.

Seven concepts were developed. Each concept was growing.

But the core catalog, the monthly catalog mailed for three decades, was not really growing anymore. And total company profit (as a percentage of net sales) was generally decreasing over time.

Something was amiss.

We studied the housefile, and learned that the "best" customers were being "bombed" by catalogs ... upwards of forty a year. Every business unit, making independent mailing decisions, mailed essentially the same customers. And all of our metrics, when viewed at a corporate level, indicated that customers were not spending fundamentally more than they spent several years ago when the new business concepts didn't exist.

So we developed a test. We selected ten percent of our housefile, and created seven columns in a spreadsheet. We randomly populated each column with the words "YES" or "NO', at a 50% / 50% proportion. Each business unit was assigned to a column. When it came time to make mailing decisions for that business unit, we referred to the column assigned to the business unit. If the word "NO" appeared, we did not mail the customer (if the customer qualified for the mailing based on RFM or model score criteria).

In statistics, this is called a 2^7 Factorial Design.

There are two reasons for designing a test of this nature.
  1. Quantify the incremental value (sales and profit) that each business unit contributes to the total brand.
  2. Identify, across customers segments, the number of catalogs a customer should receive to optimize profitability.
What did we learn?
  1. Each catalog mailed to a customer drove less and less incremental increases in sales. If a dozen catalogs caused a customer to spend $100, then two dozen catalogs caused customers to spend $141, and three dozen catalogs caused customers to spend $173. The relationship roughly approximated the Square Root Rule you've read so much about on this blog.
  2. Each business unit, on average, was contributing only 70% of the volume that company reporting suggested the business unit was contributing. In other words, if you didn't mail the catalogs, you'd lose 70% of the sales, with customers spending 30% elsewhere.
The latter point is critical.

Take a look at the table below, one that illustrates the profit and loss statement reported by finance, and one that applies the results of the test.

Test Results Analysis
Finance From


Reported Test Results
Demand
$50,000,000 $35,000,000
Net Sales 82.0% $41,000,000 $28,700,000
Gross Margin 55.0% $22,550,000 $15,785,000
Less Marketing Cost
$9,000,000 $9,000,000
Less Pick/Pack/Ship 11.0% $4,510,000 $3,157,000
Variable Profit
$9,040,000 $3,628,000
Less Fixed Costs
$6,000,000 $6,000,000
Earnings Before Taxes
$3,040,000 ($2,372,000)
% Of Net Sales
7.4% -8.3%

The test indicated that what appeared to be highly profitable business units were actually marginally profitable, or in some cases, unprofitable. In this example, the business unit is "70% incremental", meaning that if the business unit did not exist, 70% of the sales volume would disappear, while 30% would be spent anyway by the customer, spent on other merchandise.

Imagine if you were the EVP responsible for a business unit that appeared to generate 7.4% pre-tax profit, only to have some rube in the database marketing department tell you that your efforts are actually draining the company of profit?


Why Does This Matter?

This style of old-school testing (which is more than a hundred years old, with elements of the testing strategy now employed aggressively in online marketing) tells you how valuable your marketing and merchandising initiatives truly are.

Catalogers fail to do this style of testing, not realizing that a portion of catalog driven sales would still be generated online (or in other catalogs). In 2008, most catalog marketers are grossly over-mailing existing buyers. Catalog Choice, in part, exists due to catalogers mis-reading this phenomenon.

E-mail marketers seldom execute these tests, not realizing that in many cases almost all of the sales would still be generated online. E-mail marketers, ask your e-mail marketing vendor to partner with you on test designs like the ones mentioned in this article. You may be surprised by what you learn!

Online marketers are more likely than most marketers to execute A/B splits at minimum, with some executing factorial designs. Many online brands evolve in a Darwinian style, fueled by the results of factorial designs. Online marketers know that you make mistakes quickly, and you correct those mistakes quickly.

Web Analytics folks have the responsibility to tell management when sku proliferation no longer contributes to increased sales. It is important for Web Analytics folks to lead the online marketing community, shutting off portions of the website in various tests to understand the incremental value of each additional sku.

What are your thoughts on this style of testing? What have you learned by executing tests of this nature?

Labels: , , , ,

May 29, 2008

Great Moments In Database Marketing #6: Long-Term Impact of Promotions at Eddie Bauer

We go back to 1998 for this Great Moment in Database Marketing.

At the time, I was Director of Circulation at Eddie Bauer, a brand that was punch-drunk on promotions. Anytime a customer failed to purchase in six months, the "CRM/Circulation" process offered the customer a "20% off $100" promotion ... twenty percent off your next order of one-hundred dollars or more".

We tested these promotions until we were blue in the face. Continually, they showed that the customer spent about twenty percent more if offered this promotion.

So, the promotions became part of "what we did". And then my team decided to execute a long-term test. For the next six months, we would not offer a segment of lapsed customers a single promotion.

What do you think happened?

Take a look at the following table, a table that approximates the actual results of the test.

Eddie Bauer Six Month Promotion Test: 1998





Receive No Incr-

Promos Promos ement




Month 1 $10.80 $9.00 $1.80
Month 2 $9.00 $9.30 ($0.30)
Month 3 $10.80 $9.60 $1.20
Month 4 $9.00 $9.90 ($0.90)
Month 5 $10.80 $10.20 $0.60
Month 6 $9.00 $10.50 ($1.50)




Demand $59.40 $58.50 $0.90
Net Sales $41.58 $40.95 $0.63
Gross Margin $22.87 $22.52 $0.35
Marketing $9.00 $9.00 $0.00
Promos $4.07 $0.00 $4.07
Pick/Pack/Ship $4.99 $4.91 $0.08
Profit $4.80 $8.61 ($3.80)
% of Sales 11.6% 21.0% -9.5%

Oh oh.

Here's the 411 folks. When customers are continually promoted to, they delay purchases until the promotion is offered to them.

In our test, if customers were not offered promotions, they slowly began to "build momentum". Instead of the every-other-month cadence of promotions to this audience (the actual test had a different rhythm than illustrated above), the customer waited for promotions, did not receive them, then started spending more.

After six months, we noticed that customer spend in the two groups was nearly identical!

Now look at profit. Sure, the group that received promotions appeared profitable --- they appeared profitable via every system we had in the company, via every A/B test we executed.

But when viewed via a long-term A/B test, the results were significantly different. We were losing a boatload of money promoting to customers who would ultimately spend the same amount of money if we didn't execute the promotion.

In 1999, we dramatically pulled back on promotions. Total Net Sales decreased by maybe five or six percent. Total profit hit an all-time record high.

The core fundamentals of direct marketing are often violated in the world of "instant metrics" we've created. Our e-mail marketing friends read open rates and conversion rates from a "Free Shipping" e-mail within an hour of blasting the campaign. The adrenaline rush felt from obtaining instant access to customer behavior fuels strategy.

My challenge to the e-mail marketing and web analytics community, two communities that live and die by a steady diet of exhilarating and instantaneous metrics, is this ... do your metrics allow you to understand if what we observed at Eddie Bauer in 1998 is happening in your business? And if your visit-specific metrics don't allow you to observe a trend like this, what kind of systems/software/human investment is needed to allow for this style of measurement?


Hillstrom's Multichannel Secrets: Fifty-Nine Facts For CEOs!
Support independent publishing: buy this book on Lulu.

Labels: , ,

April 26, 2008

Retail Catalog Marketing

Retail catalog marketing is an inexact, imprecise science.

Let's assume that a major American retail brand sends you a catalog on April 1. Let's also assume that your small business purchases from this major American retail brand on the 15th of every month, regardless of marketing activity.

Did the catalog cause you to purchase merchandise?

The answer is probably "no".

The catalog may have influenced the merchandise you purchased. The catalog may have caused you to spend more than you normally would have. The catalog may have caused you to spend less than you normally would have.

But you would have purchased merchandise anyway, no matter what. You always buy something from this brand on the 15th of the month.

Now let's pretend you are the Database Marketing Executive at this major American retail brand. Your job is to measure the effectiveness of this retail catalog marketing effort. Using the tools and techniques available to the database marketers, let's see if you would decide to mail this sample customer future catalogs.


Methodology = Mail And Holdout Groups: Do Not Mail This Customer A Catalog

This is a classic direct marketing strategy, practiced for more than a century (and maybe for centuries). When measuring effectiveness by mail and holdout groups, we'd learn that this customer would purchase regardless of catalog marketing. Therefore, the segment this customer belongs to is not considered a "responder".


Methodology = Pattern Detection: Do Not Mail This Customer A Catalog

Pattern detection suggests that this customer buys on the 15th of every month. The database marketing executive learns that marketing doesn't influence this customer. Therefore, this individual customer would not be considered a responder.


Methodology = Matchback Analytics: Mail This Customer A Catalog

Matchback analytics, the kind offered by major list processing corporations, co-ops, and data compilers, match purchases within a window of time to a marketing activity. Let's say that the matchback window is three weeks (oftentimes, the matchback window is something silly, like ninety days or six months). Any retail purchase within three weeks of the catalog mailing is attributed to the catalog mailing. Therefore, this individual customer would be considered a responder. Here's a little secret. Matchback analytics grossly over-state the effectiveness of most retail activities. You've been warned!!


Methodology = Brand Marketing: Mail This Customer A Catalog

All too often, retail catalog marketing falls into the brand marketing arena. In other words, a budget is set, say $1,000,000. The database marketing team is asked to mail a million customers, to use up the entire budget. The database marketing team executes the strategy. In this case, if our sample customer buys every month, the customer is a "good" customer, and will receive this catalog. This is the most common scenario in retail catalog marketing --- the CMO determines a budget, the CMO determines the marketing tactics that will be employed, and the database marketing executive picks the best customers for any given strategy. In some instances, rogue database marketers set up tests to determine if the strategy actually worked or not. I've executed this rogue strategy myself --- I wanted to understand how much money my company was losing. For the most part, however, the effectiveness of the mailing isn't even measured.


Retail catalog marketing
is an inexact, imprecise science. The corporate culture, the quality of information captured in the customer database, and the measurement technique used by the database marketing team determine whether you will receive a retail catalog from your favorite American retail brand.

How does your company execute measurement of retail catalog marketing activities?

Labels: , , ,

January 03, 2008

Testing Issues

Recall that my focus in 2008 is on multichannel profitability.

Experimental design (aka 'tests') is one of the most useful tools available to help us understand multichannel profitability.

We run into a ton of problems when designing and analyzing 'tests'. Let's review some of the problems.


Problem #1: Statistical Significance

Anytime we want to execute a test, a statistician will want to analyze the test (remember, I have a statistics degree --- I want to analyze tests!).

In order to make sense of the conclusions, the statistician will introduce the concept of "statistical significance". In other words, the statistician will tell you if the difference between a 3.0% and 2.9% click-through rate is "meaningful". If, according to statistical equations, the difference is not deemed to be "meaningful", the statistician will tell you to ignore the difference, because the difference is not "statistically significant".

Statisticians want for you to be right 90% of the time, or 95% of the time, or 99% of the time.

We all agree that this is critical when measuring the effectiveness of a cure for AIDS. We should all agree that this isn't so important when measuring the effectiveness of putting the shopping cart in the upper right-hand corner of an e-mail campaign.

Business leaders are seldom given opportunities to capitalize on something that will work 95% of the time. Every day, business leaders make decisions based on instinct, on gut feel, not having any data to make a decision. Knowing that something will work 72% of the time is a blessing!

Even worse, statistical significance only holds if the conditions that existed at the time of the test are identical to the conditions that exist today. Business leaders know that this assumption can never be met.

Test often, and don't limit yourself to making decisions only when you're likely to be right 99% of the time. You'll find yourself never making meaningful decisions if you have to be right all the time.


Problem #2: Small Businesses

Large brands have testing advantages. A billion dollar business can afford to hold out 100,000 customers from a marketing activity. The billion dollar business gets to slice and dice this audience fifty different ways, feeling comfortable that the results will be consistent and reliable.

Small businesses are disadvantaged. If you have a housefile of 50,000 twelve-month customers, you cannot afford to hold out 10,000 from a catalog or e-mail campaign.

However, small business can afford to hold out 1,500 twelve-month customers out of 50,000. The small business will not be able to slice and dice the data the way a large brand can. The small business will have to make compromises.

For instance, look at the variability associated with ten customers, four of which spend money:
  • $0, $0, $0, $0, $0, $0, $50, $75, $150, $300.
    • Mean = $57.50.
    • Standard Deviation = $98.63.
    • Coefficient of Variation = $98.63 / $57.50 = 1.72.
Now look at the variability associated with measuring response (purchase = 1, no purchase = 0).
  • 0, 0, 0, 0, 0, 0, 1, 1, 1, 1
    • Mean = 0.40.
    • Standard Deviation = 0.516.
    • Coefficient of Variation = 0.516 / 0.40 = 1.29.
The small company can look at response, realizing that response is about twenty five percent "less variable" than the amount of money a customer spent.

Small companies need to analyze tests, sampling 2-4% of the housefile in a holdout group, focusing on response instead of spend. The small company realizes that statistical significance may not be achievable. The small company looks for "consistent" results across tests. The small company replicates the rapid test analysis document, using response instead of spend.


Problem #3: Timeliness

The internet changed our expectations for test results. Online, folks are testing strategies in real-time, adjusting landing page designs on Tuesday morning based on results from a test designed Monday morning, executed Monday afternoon.

In 1994, I executed a year-long test at Lands' End. I didn't share results with anybody for at least nine months. What a mistake. We had spirited discussions from month ten to month twelve that could have been avoided if communication started sooner.

Start analyzing the test right away. Share results with everybody who matters. Adjust your results as you obtain more information. It is ok that the results change from month two to month three to month twelve, as long as you tell leadership that results may change. Given the fact that the online marketers are making changes in real-time, you have to be more flexible.


Problem #4: Belief

You're going to obtain results that run contrary to popular belief.

You might find that your catalog drives less online business than matchback results suggest. You might find that advertising womens merchandise in an e-mail campaign causes customers to purchase cosmetics.

You might find that your leadership team dismisses your test results, because the results do not hold up to what leadership "knows" to be true.

Just remember that people once thought the world was flat, that the universe orbited Earth, and that subprime mortgages could be packaged with more stable financial instruments for the benefit of all. If unusual results can be replicated in subsequent tests, the results are not unusual.

Leadership folks aren't a bunch of rubes. They have been trained to think a certain way, based on the experiences they've accumulated over a lifetime. It will take time for those willing to learn to change their point of view. It does no good to beat them over the head with "facts".

Labels:

January 01, 2008

Rapid Test Results

In 2008, I'm going to focus energy discussing how test results and Multichannel Forensics increase profitability, and hopefully decrease customer dissatisfaction. Today, we begin the discussion by exploring the concept behind a project I call "Rapid Test Results".

One of the easiest ways for multichannel catalogers, retailers and e-mail marketers to understand customer behavior is through the use of "A/B" tests.

In an "A/B" test, one representative group of customers receive a marketing activity, while the other representative group of customers do not receive a marketing activity.

The catalog industry uses matchback algorithms to understand multichannel behavior. As most of us understand, matchback algorithms over-state the effectiveness of marketing activities.

Conversely, e-mail marketers understate the effectiveness of e-mail marketing activities when using open rates, click-through rates, and conversion rates.

Therefore, we need to improve the understanding of our marketing activities. One way to do this is to create and analyze more "A/B" tests, often called "mail/holdout" tests.

It can be very easy to execute these tests.

However, we don't always have the resources necessary to analyze and understand the test results.

If you are an executive who falls into the latter category, I have something for you. It is called "Rapid Test Results".

For my loyal blog readers, executives, and current customers, I have an inexpensive proposal just for you. The Rapid Tests Results Analysis Document outlines an inexpensive project that gets you results to the tests you executed, within just a few days of sending your information for analysis purposes.

If there's one thing I learned in 2007, it is that e-mail and catalog teams are minimally staffed! And yet, the information that can be gleaned from tests executed by e-mail and catalog marketing teams can shape the future direction of your organization.

So if any of the following criteria are met by your organization, please consider a Rapid Test Results Project:
  • You are an e-mail marketer who believes your e-mail campaigns drive more sales and profit than you can measure via standard metrics like open rate, click-through rate, and conversion rate.
  • You are a catalog marketer who wants to truly understand if multichannel customers respond to catalog marketing, and want to truly learn the impact of catalog marketing on the online channel.
  • You are a catalog marketer who wants to reduce catalog marketing expense (and benefit the environment) by limiting contacts to internet customers.
  • You do not have the analytical resources to analyze test results quickly.
  • You do not have the systems support to measure test results by different customer segments, across different channels, or across different merchandise classifications.
  • Your executive team does not understand the constraints and limitations that prevent your team from analyzing all of your tests in a timely manner.

Labels: ,

December 07, 2006

A/B Test Design And Incremental Multichannel Campaign Performance

Never before has the traditional "A/B" test been as important as it is in our multichannel ecosystem. Such a simple concept, the "A/B" test is uniquely designed to measure the incremental performance of marketing activities.

As an example, assume a multichannel organization mails a catalog to a housefile list of 1,000,000 names. The database marketer chooses the best 1,100,000 households, and randomly splits them into two groups. The "A" portion of the test are the 1,000,000 households who receive the catalog. The "B" portion of the test are 100,000 households who will not receive the catalog.

Maybe a month after the in-home date, the database marketing analyst is prompted to analyze the results. Within each group, the 1,000,000 who received the catalog, and the 100,000 who didn't receive it, the analyst calculates the average net sales within the catalog/telephone channel, the online channel, and the retail channel.

Here are sample results:


Quantity Telephone Online Retail Totals
Received Catalog 1,000,000 $6.00 $8.00 $21.00 $35.00
Did Not Receive Catalog 100,000 $2.50 $7.00 $19.50 $29.00
Incremental Lift
$3.50 $1.00 $1.50 $6.00

In this example, the catalog drove an incremental $3.50 per customer to the catalog/telephone channel, $1.00 per customer to the online channel, and $1.50 per customer to the retail channel, for a total of $6.00 incremental sales per customer.

Because we mailed 1,000,000 households, the total net sales attributed to this mailing is 1,000,000 * $6.00 = $6,000,000.

Some vendors advocate a different methodology --- they advocate allocating any online and retail order generated during the time the catalog was active to the mailing of the catalog. This results in a gross over-estimation of the importance of the catalog. Please don't go down this path.

A similar methodology can be used to test multiple marketing activities at the same time. Assume an e-mail campaign was mailed to the opt-in portion of this audience. Within this audience, you randomly assign customers to one of four test segments. Here are some sample results.



Quantity Telephone Online Retail Totals
Catalog + E-Mail 400,000 $5.50 $8.50 $21.25 $35.25
Catalog Only 50,000 $6.00 $8.00 $21.00 $35.00
E-Mail Only 50,000 $3.00 $8.10 $18.65 $29.75
No Catalog, No E-Mail 50,000 $3.50 $7.00 $18.50 $29.00

Tests like these yield interesting and intriguing results. Notice that the best strategy for the catalog/telephone channel was to mail only a catalog. The best strategy for the online channel was to mail a catalog and an e-mail. The best strategy for the retail channel was to mail both a catalog and an e-mail.

Statisticians can assist with significance tests, if you feel that is appropriate. It is more important to simply execute tests of this nature, and learn how all of your marketing activities interact with each other. What you learn about how marketing activities and channels interact with each other within our multichannel ecosystem may surprise you.

Labels: , , , , , ,

November 17, 2006

Williams Sonoma: Incremental Online Sales and Matchback Analysis

Williams Sonoma always does a nice job of sharing fun facts with the public. In their third quarter earnings release, they state that "55% of online revenues are generated by customers who recently received a catalog."

This is always an interesting topic of debate in the database marketing world. Williams Sonoma does not specifically state which of two popular analytical methods they use to measure this metric.

Most popular, and most vigorously argued against by the analytically adept, is the method of attributing every online order to the catalog channel, if a customer recently received a catalog. The theory behind this technique (often called a "matchback analysis") is that the catalog inspired the order. Many vendors promote this methodology, and for good reason. The technique can overstate orders attributed to mailed catalogs, and vendors have a vested interest in promoting paper as a viable means of profitable marketing. Critics will argue that if you mail your entire housefile, this methodology will cause you to attribute every single online order to the mailing of the catalog. Critics will also argue that if you mail every housefile name a catalog, and send every housefile name an e-mail, the methodology completely breaks-down, rendering the analysis useless.

Less popular is the method of an "A/B" split. The marketer randomly splits her mail list into two halves. 50,000 customers receive the catalog, a like group of 50,000 customers do not receive the catalog. Several weeks after the in-home date, the marketer measures total sales in the mail group and control group, in both catalog and online (and, where applicable, retail) channels. This method tends to provide much less-optimistic answers than the "matchback analysis". Critics will argue that this methodology cannot produce reliable results due to sampling error issues.

Which methodology do you believe is more appropriate for allocating online orders to the marketing channel that drove the order?

Labels: , , , , ,