IntelliSurvey Blog

Survey Data Cleaning: Ensuring Quality and Reliability

Written by IntelliSurvey Staff | February 8, 2024

Data quality, reliability, and integrity are paramount in survey research. Survey data cleaning plays a pivotal role in achieving this goal. Bad data and bad actors can creep into any survey, and it is important to be prepared to cleanse your data for the best results. 

What is Survey Data Cleaning?

Data cleaning (or data scrubbing) is the process of reviewing the data you’ve collected and removing erroneous, incorrect, or otherwise suspect results that could skew your analysis. 

Think of data cleaning as an extra quality control step. By taking this extra step, you’ll have higher-quality data for more reliable insights. 

What Causes Bad Data?

Various factors, such as data entry errors, technical issues, low-quality responses, and fraudulent respondents, can cause bad data in survey responses. The questionnaire can also be to blame - poorly worded questions (such as leading or double-barreled questions) can jeopardize data quality.

Poor-quality data is especially harmful for low-incidence surveys where even a small amount of bad data can drastically change the results.  

When Should You Clean Your Survey Data?

You don’t have to wait until after fielding your survey to start cleaning your data. In fact, you can (and should) begin as soon as possible, which includes testing before you launch your survey to catch potential issues preemptively. 

On platforms like IntelliSurvey, some cleaning happens in real-time as soon as respondents begin answering your questionnaire. For example, our CheatSweep™ proprietary data cleansing algorithm automatically detects and discards obvious cheaters. Bad-quality respondents are also flagged and removed so they don’t appear in online reports or count towards quotas. 

​​Manual cleaning is also very important after the respondent has completed the survey. We suggest cleaning data on a daily basis (if not more) to ensure poor data does not affect your quotas while in field.

What Survey Data Should You Clean? 

When cleaning your survey data, you should focus on identifying and addressing various types of errors, inconsistencies, and outliers. Here are several areas you should consider cleaning. 

Speeders

Respondents who complete a survey much faster than what would be considered reasonable are called "speeders." These individuals rush through the survey, providing answers without giving much thought to the questions or their responses.

Straightlining/patterns

Straightlining refers to the behavior of consistently selecting the same response option (e.g., always choosing “Strongly Agree” or “Yes”) for multiple questions in a row, regardless of the content of each question. Researchers often use strategies such as setting a minimum response variability threshold to address this issue. Respondents who consistently provide the same responses or patterns for all questions may have their data reviewed or excluded from the analysis.  

Outliers

Outliers can introduce inaccuracies to the data set via extreme values that significantly deviate from the rest of the responses. They may represent data errors (such as keying in 22 children instead of 2 children in response to a demographic question), misunderstandings by respondents, or other anomalies. Once outliers are identified, it is up to the researcher to decide whether to remove, transform, or handle them differently based on their specific objectives. 

Open-end responses

Open-ended responses contain unstructured and free-text information. While valuable, it is also susceptible to noise, such as irrelevant comments and spam. To ensure data quality, consistency, and reliability, including open-ended questions in your survey and cleaning them is essential.

Inconsistent or unrealistic responses

Inconsistent or unrealistic responses, such as self-contradictory answers, can introduce errors and inaccuracies in the data set. For example, someone who says they have no kids and later states they spend money on daycare for two of their children. Or someone who says they watch 200 hours of TV every week when there are only 168 hours in a week.

Failed attention checks

Attention checks help assess whether respondents are paying proper attention to the survey questions and responding thoughtfully. These can be static, such as “please choose the color ‘red’ from this list,” or dynamic, such as one that changes to prevent bots from learning how to get through.

Technology flags

Technology flags are mechanisms used to identify and manage issues related to the use of technology by survey respondents. Monitoring IP addresses, for example, can help identify respondents attempting to take the survey multiple times. Geolocation flags and country/time zones are also common tactics that can be used to identify respondents from unexpected locations.

Data Cleaning With IntelliSurvey

Survey data cleaning is essential in ensuring the quality, reliability, and integrity of the collected data. It serves as an extra layer of quality control, addressing a wide range of issues that can compromise the accuracy of research findings. 

IntelliSurvey provides both automated and manual data cleaning on every project. Our market research experts start at the questionnaire level to offer advice on best practices for quality control. We continue to support you in field with interim data cleansing to ensure proper quota filling without delays, and we are on-hand to help once the survey closes, too. 

For more information on how our approach to data quality can benefit your next project, please get in touch