Published on
October 6, 2024
Written by
Noa Kalmanovich
Table of contents
Presented at ESOMAR’s annual Congress conference in Athens, Greece, our collaboration with the Ifop Group was exhibited in the white paper titled “Synthetic Data in Marketing Studies: Exploring the promise of generative AI and synthetic data.” To address a key industry challenge in data collection processes, we worked with Thomas Duhard, Head of Data Projects at Ifop, to push the boundaries of AI and understand the potential of synthetic samples for our industry’s search for insights.
Below is a brief overview of the content, but you can download the full paper and watch our recent presentation from the ESOMAR Congress event earlier this month.
Standard data collection practices often struggle to balance fundamental economic and technical factors, such as assuring representativeness, achieving sufficient sample sizes, and maintaining data quality. By leveraging augmented respondents, we provide a straightforward solution to this problem by narrowing the scope and boosting real data with AI-generated synthetic sample boosters.
In the paper, the authors demonstrate the effectiveness of synthetic sample boosters through over 7,000 parallel tests using datasets from the Pew Research Center to compare real boosts to AI-generated boosts, illustrating how it can improve samples of low-incidence populations that are often hard to analyze.
The paper then explains the methodology behind the calculation of Effective Sample Sizes (ESS) and boost factors, concluding that, on average, Fairgen is as reliable as three times the amount of real data on the sub-segment level.
The paper then showcases a study on the European elections; Ifop augmented a key swing group of secondary school teachers using our synthetic boosts. The political poll included a representative sample of 8,000 French adults, with only 116 respondents from the teacher demographic. By employing augmented synthetic respondents, this group was boosted to 580 respondents, correcting inconsistencies and aligning the sample with sociological plausibility, ultimately providing a better read into this rare demographic’s influence on the election's outcome. Moreover, the results showed that AI can reliably mimic human responses, enhancing the representation of niche groups.
While the industry benefits from the economic gain and flexibility offered by augmented synthetic respondents, the paper highlights several key concerns surrounding synthetic data:
Samuel and Thomas address these challenges and propose responsible deployment strategies to set a standard for ethical and effective use of augmented synthetic samples.
In conclusion, while augmenting real data promises significant benefits in terms of delivering unprecedented granular insights, it is essential to operate within the technology’s limitations. Careful deployment is vital for maintaining data quality and preventing misuse.
Through this collaboration, Fairgen and IFOP demonstrate that synthetic data is a powerful and viable tool for modern quantitative research. By acknowledging its limitations and maximizing its potential, synthetic data can drive granular recommendations and propel the industry forward.
Access the full paper and watch our talk from the ESOMAR Congress event here.
Presented at ESOMAR’s annual Congress conference in Athens, Greece, our collaboration with the Ifop Group was exhibited in the white paper titled “Synthetic Data in Marketing Studies: Exploring the promise of generative AI and synthetic data.” To address a key industry challenge in data collection processes, we worked with Thomas Duhard, Head of Data Projects at Ifop, to push the boundaries of AI and understand the potential of synthetic samples for our industry’s search for insights.
Below is a brief overview of the content, but you can download the full paper and watch our recent presentation from the ESOMAR Congress event earlier this month.
Standard data collection practices often struggle to balance fundamental economic and technical factors, such as assuring representativeness, achieving sufficient sample sizes, and maintaining data quality. By leveraging augmented respondents, we provide a straightforward solution to this problem by narrowing the scope and boosting real data with AI-generated synthetic sample boosters.
In the paper, the authors demonstrate the effectiveness of synthetic sample boosters through over 7,000 parallel tests using datasets from the Pew Research Center to compare real boosts to AI-generated boosts, illustrating how it can improve samples of low-incidence populations that are often hard to analyze.
The paper then explains the methodology behind the calculation of Effective Sample Sizes (ESS) and boost factors, concluding that, on average, Fairgen is as reliable as three times the amount of real data on the sub-segment level.
The paper then showcases a study on the European elections; Ifop augmented a key swing group of secondary school teachers using our synthetic boosts. The political poll included a representative sample of 8,000 French adults, with only 116 respondents from the teacher demographic. By employing augmented synthetic respondents, this group was boosted to 580 respondents, correcting inconsistencies and aligning the sample with sociological plausibility, ultimately providing a better read into this rare demographic’s influence on the election's outcome. Moreover, the results showed that AI can reliably mimic human responses, enhancing the representation of niche groups.
While the industry benefits from the economic gain and flexibility offered by augmented synthetic respondents, the paper highlights several key concerns surrounding synthetic data:
Samuel and Thomas address these challenges and propose responsible deployment strategies to set a standard for ethical and effective use of augmented synthetic samples.
In conclusion, while augmenting real data promises significant benefits in terms of delivering unprecedented granular insights, it is essential to operate within the technology’s limitations. Careful deployment is vital for maintaining data quality and preventing misuse.
Through this collaboration, Fairgen and IFOP demonstrate that synthetic data is a powerful and viable tool for modern quantitative research. By acknowledging its limitations and maximizing its potential, synthetic data can drive granular recommendations and propel the industry forward.
Access the full paper and watch our talk from the ESOMAR Congress event here.
Subscribe to our newsletter