May I Have A Copy Of Your Receipt? Crowdsourcing Data Collection

May I Have A Copy Of Your Receipt? Crowdsourcing Data Collection

To report what people buy in grocery stores, market research companies around the world today rely heavily on scanning data collected at the point of sale. It’s very rare, however, to get universal cooperation from all the major retailer chains in a given market. In fact, in many countries, chains outside the cooperation system can be responsible for up to 10% of the turnover of fast-moving consumer goods. While consumer panels can help, they are typically not large enough to adequately represent these missing chains.

Nielsen has been addressing this issue for many years now by collecting receipts directly from consumers, in front of stores outside the cooperation system; more than 4 million receipts are collected that way each year in 20 different countries. This method works, but it is time-consuming and expensive. Thankfully, recent technological developments have made it possible to work out a new way of estimating purchases made at these chains: asking shoppers to download a special app on their mobile phone and use that app to take a photo of their shopping receipts.

To test this promising data collection approach, Nielsen kicked off a proof of concept project in the U.K., where 6,000 users have already signed up and sent us more than 90,000 store receipt pictures. What do we do with those pictures once we’ve received them? In a first phase of the project, we crowdsourced them to human readers (via Amazon’s Mechanical Turk) to make sure the data was transcribed correctly. However, despite our best coaching efforts, important pieces of information were still missing or misinterpreted.

We’re now processing the pictures via a sophisticated optical character reading (OCR) solution. OCR comes with great automation benefits, but it’s also an enormous technological challenge in our particular situation. We have to deal with a large volume of images with noisy background, and developing an algorithm to remove that noise is not a trivial endeavor. Another difficulty is to properly transform each receipt image into a set of database entries: the algorithm needs to be capable of finding the relevant information on each receipt, interpreting each data point accurately regardless of its position on the receipt, and determining the correct meaning of the product descriptions found on those receipts.

Past these technological obstacles, we also need to address methodological challenges to make this approach viable: How do we find the people we need? What can we do to convince them to use the app? How do we encourage compliance? How can we streamline the whole process?

We will first explore how we can get usable information from the receipt images we’re collecting in the U.K. Our aims are to have the same information in our databases as what is on the receipt and to understand the meaning of what we have. We’ll then evaluate how accurately the data represents the reality of all U.K. shoppers.