- Recommendation Types
- Choosing an Approach
- Providing Manual Recommendations
- Providing Automatic Recommendations
- Final Thoughts
Providing Automatic Recommendations
If it makes more sense to go the “automatic” route, then you actually don’t need to create a new table at all. A JOIN across the products and order_contents tables will do the trick. That JOIN is a bit complicated, so I’ll walk through it in steps.
If the current product is represented by $pid, then this next query retrieves the product IDs for other products that were bought in orders in which $pid was also purchased:
SELECT oc2.product_id FROM order_contents AS oc1 JOIN order_contents AS oc2 ON oc1.order_id=oc2.order_id WHERE oc1.product_id=$pid AND oc2.product_id != $pid
The second part of the WHERE clause must be added to prevent the current product from also being returned.
To get the actual product information for the returned list of products, you need to add a JOIN to products:
SELECT p.* FROM order_contents AS oc1 JOIN order_contents AS oc2 ON oc1.order_id=oc2.order_id JOIN products AS p ON p.product_id=oc2.product_id WHERE oc1.product_id=$pid AND oc2.product_id != $pid
Now the query retrieves every product that was purchased at the same time as the current product. However, each also-purchased product may be returned multiple times, and there’s no merit given to each also-purchased product. By that I mean that the children’s book that happened to be purchased at the time someone bought an iPod is returned just the same as the iPod accessories more commonly purchased with an iPod. To fix that (i.e., to make the recommendation smarter), the query needs to count the number of times an item appears and sort the result by that count:
SELECT p.* FROM order_contents AS oc1 JOIN order_contents AS oc2 ON oc1.order_id=oc2.order_id JOIN products AS p ON p.product_id=oc2.product_id WHERE oc1.product_id=$pid AND oc2.product_id != $pid GROUP BY oc2.product_id ORDER BY COUNT(oc2.product_id) DESC
As you might imagine, having a query like this executed on the fly for every product displayed will take a toll on the site’s performance. I’d be inclined to turn that query into a VIEW table. A VIEW table is a table created in memory that represents the results of a query. By using a VIEW to represent the complex JOIN, you can perform simple SELECT queries as needed.
Instead of creating one VIEW for each product (the above query only returns the recommendations for a single item), the VIEW should represent perhaps the five best product recommendations for each product in the database (that has been ordered). To do that, the SELECT query needs to be modified so that it:
- Also selects the original product ID
- Also selects the COUNT() value (i.e., the significance of a recommendation)
- Applies a LIMIT clause
Here, then, is the SQL command used to define a VIEW:
CREATE VIEW product_recommendations AS SELECT p1.product_id AS product_id, COUNT(oc2.product_id) AS rank, p2.* FROM order_contents AS oc1 JOIN order_contents AS oc2 ON oc1.order_id=oc2.order_id JOIN products AS p2 ON p2.product_id=oc2.product_id JOIN products AS p1 ON oc1.product_id=p1.product_id WHERE oc1.product_id=p1.product_id AND oc2.product_id != p1.product_id GROUP BY oc2.product_id ORDER BY rank DESC LIMIT 5
Whew! That’s a pretty good query and a great use of VIEW tables. Now each product page can fetch the recommendations using a simple SELECT:
SELECT * FROM product_recommendations WHERE product_id=$pid ORDER BY rank DESC
The database will be able to return those query results much faster than it would the original complex JOIN.