Anderson's Angle

Using Reviews to Create a Recommender System That Works

Published February 1, 2022

Updated December 9, 2022

Martin Anderson

If you have ever bought a product online and marveled at the inanity and non-applicability of the ‘related items’ that haunt the buying and after-sales process, you already understand that popular and mainstream recommender systems tend to fall short in terms of understanding the relationships between prospective purchases.

If you buy a unlikely and infrequent item, such as an oven, recommendations for other ovens are likely to be superfluous, though the worst recommender systems fail to acknowledge this. In the 2000s, for example, TiVO’s recommender system created an early controversy in this sector by reassigning the perceived sexuality of a user, who subsequently sought to ‘re-masculinize’ his user profile by selecting war movies – a crude approach to algorithm revision.

Worse yet, you don’t need to actually buy anything at (for instance) Amazon, or actually begin watching a movie whose description you’re browsing at any major streaming platform, in order for information-starved recommender algorithms to start merrily down the wrong path; searches, dwells and clicks into the ‘details’ pages are enough, and this scant (and probably incorrect) information is likely to be perpetuated across future browsing sessions at the platform.

Trying to make a Recommender System Forget

Sometimes it’s possible to intervene: Netflix provides a ‘thumbs up/down’ system which should in theory help its machine learning algorithms remove certain embedded concepts and words from your recommendations profile (though its efficacy has been questioned, and it remains much easier to evolve a personalized recommender algorithm from scratch than it is to remove undesired ontologies), while Amazon lets you remove titles from your customer history, which should downgrade any unwelcome domains that infiltrated your recommendations.

Hulu has a similar feature, while HBO Max has partially retreated from algorithm-only recommender systems, in the face of their current shortcomings.

None of these strictly consumer-level experiences even touch on the widespread and growing criticism of ‘passive’ advertising platform recommender systems (where notable change is coming due to public ire), or the incendiary topic of social media AI recommendations, where sites such as YouTube, Twitter and Facebook continue to endure criticism for non-relevant or even damaging recommendations.

The machine doesn’t seem to know what we want, unless we want the adjacent item that came up in our search – even if that item is essentially a duplicate or alternate to the primary item that we may have just bought, rather than a potential complementary or ancillary purchase.

Accurate Recommendations with Review Data

A new research collaboration from China and Australia offers a novel method to address such non-apposite recommendations, by using external user-reviews to gain a better understanding of the real relationships between items in a shopping session. In tests, the architecture outperformed all current state-of-the-art methods, offering hope for recommender systems that have a better internal map of the dependencies of items:

RI-GNN outperforms major competitors in terms of accuracy of relationships between items, performing best on sessions with more than five items. The system was tested against the Pet Supplies and Movies and TV datasets from Amazon Review Data (2018). Source: https://arxiv.org/pdf/2201.12532.pdf

To boot, the project addresses the notable challenge of creating recommendations even in anonymous sessions, where the recommender system has no access to user-contributed details, such as purchase history, or the user’s own online reviews of prior purchases.

The new paper is called Rethinking Adjacent Dependency in Session-based Recommendations, and comes from researchers at the Qilu University of Technology and the Beijing Institute of Technology in China, RMIT University at Melbourne, and the Australian Artificial Intelligence Institute at the University of Technology Sydney.

What’s Next?

The core task of session-based recommendations (SBR) is to determine the ‘next’ item along from the current item, based on its calculated relationship to the current item. In practical terms, this could manifest as a list of ‘Related items’ in an item page for a bird-cage at an ecommerce web site.

If you’re buying a bird cage, what else are you likely to need? Well, at the very least, you’re going to need a bird to put in it – that’s a true dependency. However, the bird-cage is featured in the ontology pet goods, where birds are not sold. Perversely, cat food sits in the same ontology, though appending a cat-feeding bowl as an associated recommendation for a bird cage product is a false dependency – a mistaken and misguided association.

From the paper: true and false relationships between several items, visualized on the right as an inter-item graph.

As is so often the case in machine learning architectures, it is a challenge to persuade a recommender system that a ‘distant’ entity (bird does not feature at all in pet products) may have an intrinsic and important relationship to an item, whereas items that are in the same category, and very close in function and central concept (such as cat feeding bowl), may be orthogonal or directly opposed to the purchase being considered.

The only way to create these mappings between ‘non-adjacent’ entities is to crowdsource the problem, since the relationships in question are a facet of human experience, can’t be guessed programmatically, and are probably beyond the affordable scope of conventional approaches to dataset labeling, such as Amazon Mechanical Turk.

Therefore the researchers have employed Natural Language Processing (NLP) mechanisms to extract salient words from reviews for a product, and have used frequencies from these analyses to create embeddings capable of ‘matching’ apparently distant items.

The architecture for Review-refined Inter-item Graph Neural Network (RI-GNN).

Architecture and Data

As the new paper notes, prior works of a similar nature have exploited a logged-in user’s own review history to provide rudimentary mappings. DeepCONN and RNS both used this approach. However, this discounts the fact that a user may not have written any reviews, or any reviews pertinent to a particular item that is ‘out of range’ of their usual buying habits. Additionally, this is something of a ‘white box’ approach, since it assumes that the user has already engaged sufficiently with the outlet to create an account and log in.

The extended Graph Neural Network (GNN) proposed by the researchers takes a more oracle-driven approach, deriving true dependencies a priori, so that, presumably, the anonymous and logged-out user can experience more relevant recommendations with minimal input required.

The review-augmented system is titled Review-refined Inter-item Graph Neural Network (RI-GNN). The researchers have tested it against two datasets from Amazon, Pet Supplies and Movies and TV. Though this solves the problem of review availability rather neatly, an in-the-wild implementation would need to locate and scrape an appropriate reviews database. Such a dataset source could, in theory, be anything from posts on a social network to answers on Quora.

High-level relationship mappings of this nature would, additionally, be valuable to a range of machine learning applications beyond recommender systems. Many current projects are hamstrung by lack of inter and intra-domain mapping due to limited funds and scope, whereas the commercial impetus of a truly knowledgeable and crowdsourced ecommerce recommender system could potentially fill that gap.

Metrics and Testing

The authors tested RI-GNN against two versions of each dataset, each of which is comprised of a user’s purchase history and general reviews of the product. Items appearing less than five times were removed, and the user history split into units of a week. The first dataset version featured all sessions with more than one item, and the second all sessions with over five items.

The project used P@K (Precision) and MRR@K (Mean Reciprocal Rank) for its evaluation metrics. Rival architectures tested were: S-KNN; GRU4Rec; S-POP; STAMP; BERT4Rec; DHCN; GCE-GNN; SR-GNN; and NARM.

The framework was trained in batches of 100 on Adam at a learning rate of 0.001, with the number of topics set to 24 and 20, respectively, for Pet Supplies and Movies and TV.

First published 1st February 2022.