Field experiences – Optimizing Data collection process with increased accuracy of collected data using field agents from remote field areas.

Primary health care research involves data collection from remote field areas. The field areas can be small isolated ghettos, urban slums, remotely located villages, urban administrative blocks or any other location which involves certain number of households with permanent residents. Data collection from isolated or remotely located field areas possess logistic challenges and challenges in effective data management.(1). With ever evolving technology “Electronic data collection” is deemed to be best method for data collection form remotes field areas, with added advantage of easy monitoring and predesigned data quality checks.(2)
We planned and implemented a household level survey at our field site “Alakudi village” approximately 10kms away from our Head Office “Thanjavur” Tamil Nadu. This survey was designed to update our 5 years old database with current census of village population. The household number ranged 1115 to 1200 including temporary houses while population ranged 3800 to 4000 as per our previous database. Our objective was to update our database with current census of the village while effectively identifying change in terms of addition and deletion of individuals and households.
We planned to carry out “Electronic data collection” through field agents and we encountered common problems which we tackled through a planned strategy.

The problems we anticipated and actually encountered can be broadly classified into two broad categories
• Logistic problems included, technical issues with electronic devices, network issues, social and cultural constraints of during data collection.
• Data Management problem was incomplete data, inaccurate data, data duplicates.

We devised a strategy to either overcome the challenge and/or minimize the effect of the problems.
The first step was to recruit the agents. Usually researchers are in dilemma to recruit agents from local population or to recruit agents from different community. The advantages of local agents being well acquainted with the community the respondents are more comfortable to share the personal information and other data with them, the working hours can be modified as the agent is in close vicinity of the community. The community interaction and participation is higher in with local agents. With this thought we managed to recruit the 4 agents (1 Male 3 Females) from local community. We also recruited 4 agents (4 Females) from outside the community with an intention to accelerate the data collection process.
We trained all the agents about our data collection system, methods to manually trace back the household and person from the list of households and individuals provided in separate booklets based on data from old database, handling the devices, managing the queries and dos and don’ts of data collection using the self-developed training module for 3 days. After the training, we initiated the data collection, and for the first day the field agents were guided by investigators on the field to resolve the real time issues with data collection if any encountered.
Second step was to monitor the data. We monitored the data collected by each agent on daily basis. The data was directly uploaded to our server from agent devices via mobile network at the end of the day.

We would look at the data next working day for,
1. Any missing fields
2. Any duplicate records
3. Any erroneous entry
We would also compare the data with the old database to check for any variation in terms of address, latitude, longitude at household level and full name, date of birth at individual level. This would enable us to recognize if the person and household are recognized correctly and the data resembles the true data.
Third Step was to rise queries and feedback. We would rise the query about any discrepancies encountered during data monitoring and request the respective field agent and the field coordinator to resolve the query as soon as possible. The agent was provided with the feedback on daily basis and were asked for feedback from their side. This would help us to set rapport between the field agent and investigators and streamline the further process.
We observed that the data for the first 3 days was having cumulative error rate of 8% per 100 households while the error rate decreased to 3% per 100 households for next 5 days and during the last phases of data collection the error rate was between 2 to 3 % per 100 households.
To conclude, proper training, initial guidance during data collection, daily data monitoring, quick and easy query resolution, interactive feedback has helped us to improve the quality of data collected form field agents in form of error free data.

1. Weir CR, Nebeker JR. Critical issues in an electronic documentation system. AMIA Annu Symp Proc AMIA Symp AMIA Symp. 2007;786–90.
2. Duracinsky M, Lalanne C, Goujard C, Herrmann S, Cheung-Lung C, Brosseau J-P, et al. Electronic versus paper-based assessment of health-related quality of life specific to HIV disease: reliability study of the PROQOL-HIV questionnaire. J Med Internet Res. 2014;16(4):e115.

Link for the training module.


  1. Hello There. I found your blog the usage of msn. This is a really well written article.
    I’ll make sure to bookmark it and come back to learn extra of your useful information. Thank you for the
    post. I’ll certainly return.

Leave a Reply

Your email address will not be published. Required fields are marked *