Perspectives: Product Reference Data for Digital Business

Perspectives: Product Reference Data for Digital Business

Once upon a time, “stores” were buildings made of bricks and mortar that sold an inventory of a few thousand products during regular business hours. The digital revolution changed all that: e-commerce retailers created web shops that were open 24 hours a day, seven days a week.

Consumers quickly embraced the convenience of e-commerce, and traditional retailers had no option but to follow. Shoppers came to expect a seamless omnichannel experience, in which products are described and identified identically in store, online, in a mobile app and in proximity-driven promotions delivered directly to their smartphones.

Unconstrained by the need to build and stock physical stores, e-commerce retailers are able to offer a huge inventory of products in endless aisles. A small brick-and-mortar consumer goods retailer may have an inventory of about 5,000 products; a supercenter might top out at about 130,000-150,000 products. Amazon’s online inventory in the U.S., however, exceeds 480 million products. As a result, the product coding problem in an e-commerce environment is massively larger than in brick-and-mortar retail.

Products are also evolving more quickly than in years past, and new product characteristics are constantly being introduced. The fast-moving consumer goods (FMCG) category alone adds more than 300,000 new products a month worldwide, complemented by 2.5 million product evolutions and extensions.

The volume of products and rate of change is well beyond what the clerical methods for maintaining product reference data in a brick-and-mortar world can handle. Those physical-world processes simply will not scale to e-commerce: they are too costly and too slow.

The challenge is not only a matter of volume and rate of change, but also richness. The barcodes, textual descriptions and characteristics that have served the industry well for the last 25 years are still necessary, but they’re no longer sufficient.

Consumers buying online want a 360-degree view of the product. Who is going to buy a $500 TV based only on a text description and a front view of the product when they can get an equally trusted brand for the same price that can be viewed from four angles, complete with pictures of the inputs and outputs, and a video offering additional information?

Not only does product data have to be richer, it has to support many ways of describing the same item. For international companies, product descriptions need to be in multiple languages and recognize differences in culture and laws. The defining characteristics of a product vary depending on whether you are manufacturing, warehousing, distributing, marketing, retailing or purchasing it. Attempting to apply a single view across the product lifecycle—most frequently the one used by the finance department—doesn’t work. These different views need to be reconciled and integrated.

Further, while manufacturers maintain the master data for the products they create and retailers maintain master data for the products they stock, this data only gives them a good view of the products they deal in. That’s great for monitoring internal processes, but it can’t be used to compare performance against competitors, external benchmarks or for product innovation.

So what are manufacturers and retailer to do? There is no single answer–certainly no silver bullet: enormous advances in commerce always bring big new challenges.

That said, we know that big data problems can only be solved using automated techniques that rely on humans only for exception processing. As such, there are a number of things big companies should be staying abreast of as they refine their online offerings:

  • Web scraping—downloading the public content of web sites—is a viable way to collect unstructured product information. Once web data is extracted, machine-learning algorithms can parse it to create an inventory of structured product descriptions. When used correctly, web scraping is a good way to gather a large amount of product information quickly and cheaply, but the information is often inconsistent or incomplete.
  • Machines can use image matching (“have we seen this product before?”) to identify products in pictures. When this isn’t possible, they can extract logos, barcodes and branding, and perform optical character recognition on any text they find. The machine would then cross-reference this information against existing product data to suggest amendments to existing coding or new coding for novel items.
  • The crowd is another source of product information. Crowdsourcing goes against the “don’t use humans” mantra, but it removes the subject-matter-expert bottleneck and delivers a web-scale process for recording on-pack product information. Crowdsourcing—the means of gathering knowledge from a participative online audience—can go where machines cannot (yet) reach, and they can do so at scale and reasonable cost.
  • In the data warehouse world, product reference data is coded to a gold standard of completeness and quality. In the big data world, it makes sense for product coding to be more pragmatic and provide content that is good enough, just in time: even small, incremental efforts can be costly on huge datasets.
  • Finally, things can change quickly and unpredictably. Until machine intelligence advances to the point where capturing and coding incremental product information is practically cost-free, it will be cost effective to curate only information of immediate interest. In the meantime, however, it will make sense to save raw product information (captured from web scrapes, the crowd, product images, etc.) so that it can be mined for additional information should priorities change.

These are all solutions-in-progress. Obviously, given the scale of the challenge, solving the product reference data conundrum is not a go-it-alone proposition. Not surprisingly, there are a number of bodies that curate standards for the classification of products. One of the most notable is GS1, which maintains standards to support supply chains and traceability across several sectors. Product coding is the responsibility of the company that adopts the standards—often consumer packaged goods manufacturers. GS1 has a program to ensure the correct application of the standards, but its operating model does not prevent different manufacturers from applying standards inconsistently. However, the increasing adoption of e-commerce (which will make inconsistency visible to consumers) will likely drive standards adherence.

Product listing brokers may be another source of product information. Examples include Brandbank (a Nielsen company), Gladson and Salsify. Individual retailers need consistent product information (e.g., photos of the same size and resolution), but they generally have different requirements from one another. The brokers provide tools and services to help manufacturers meet these requirements. More importantly, they also act as a hub for data exchange, providing a single point of contact between manufacturers and retailers. Give your information to a broker, and they will make sure each retailer gets what they need.

Brokers have product information for their own client bases, but their business models don’t support sourcing product information for companies that are not signed up. So brokers are a vital component of the product reference solution, but they are not a complete solution. Indeed, there is no obvious revenue driver that would encourage a manufacturer, retailer, or any third-party provider of product information to step up and provide a comprehensive product registry for digital business.

To arrive at a feasible solution, manufacturers and retailers across a broad span of industries could collaborate to create foundational product reference data for the common good. The curated data would be open-source allowing it to be used freely to connect big data sets at no cost. These basic product attributes would support simple, generic analysis of integrated data.

However, an open-source approach would be unlikely to deliver the rich, customized, high-quality product reference data available in the FMCG—the cost of doing so would be high, and the open-source model would offer no corresponding revenue opportunities. A more likely scenario is that enriched and customized product information would be provided by third parties as a paid-for snap-on to the foundational, open-source product reference data.

The retail industry has to come to terms with the volume, variety and velocity of product reference data—the three Vs of big data naturally apply. It will be interesting to see if the model of proprietary enhancement atop an open-source foundation that has been successful in big data software works for product reference information, or if some wholly new solution arises.