Sampling: how to select participants in my research study? (2024)

  • Journal List
  • An Bras Dermatol
  • v.91(3); May-Jun 2016
  • PMC4938277

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Sampling: how to select participants in my research study? (1)

Instructions for authorsPrevious IssuesSubmit a manuscriptAnais Brasileiros de Dermatologia

An Bras Dermatol. 2016 May-Jun; 91(3): 326–330.

PMCID: PMC4938277

PMID: 27438200

Jeovany Martínez-Mesa,1 David Alejandro González-Chica,2 Rodrigo Pereira Duquia,3 Renan Rangel Bonamigo,3 and João Luiz Bastos4

Author information Article notes Copyright and License information PMC Disclaimer

Abstract

Background

In this paper, the basic elements related to the selection of participantsfor a health research are discussed. Sample representativeness, sampleframe, types of sampling, as well as the impact that non-respondents mayhave on results of a study are described. The whole discussion is supportedby practical examples to facilitate the reader's understanding.

Objective

To introduce readers to issues related to sampling.

Keywords: Dermatology, Epidemiology and biostatistics, Epidemiologic studies, Sample size, Sampling studies

INTRODUCTION

The essential topics related to the selection of participants for a health researchare: 1) whether to work with samples or include the whole reference population inthe study (census); 2) the sample basis; 3) the sampling process and 4) thepotential effects nonrespondents might have on study results. We will refer to eachof these aspects with theoretical and practical examples for better understanding inthe sections that follow.

TO SAMPLE OR NOT TO SAMPLE

In a previous paper, we discussed the necessary parameters on which to estimatethe sample size.1 We definesample as a finite part or subset of participants drawn from the targetpopulation. In turn, the target population corresponds to the entire set ofsubjects whose characteristics are of interest to the research team. Based onresults obtained from a sample, researchers may draw their conclusions about thetarget population with a certain level of confidence, following a process calledstatistical inference. When the sample contains fewer individuals than theminimum necessary, but the representativeness is preserved, statisticalinference may be compromised in terms of precision (prevalence studies) and/orstatistical power to detect the associations of interest.1 On the other hand, sampleswithout representativeness may not be a reliable source to draw conclusionsabout the reference population (i.e., statistical inference is not deemedpossible), even if the sample size reaches the required number of participants.Lack of representativeness can occur as a result of flawed selection procedures(sampling bias) or when the probability of refusal/non-participation in thestudy is related to the object of research (nonresponse bias).1,2

Although most studies are performed using samples, whether or not they representany target population, census-based estimates should be preferred wheneverpossible.3,4 For instance, if all cases of melanoma are available on anational or regional database, and information on the potential risk factors arealso available, it would be preferable to conduct a census instead ofinvestigating a sample.

However, there are several theoretical and practical reasons that prevent us fromcarrying out census-based surveys, including:

  1. Ethical issues: it is unethical to include a greater number ofindividuals than that effectively required;

  2. Budgetary limitations: the high costs of a census survey often limits itsuse as a strategy to select participants for a study;

  3. Logistics: censuses often impose great challenges in terms of requiredstaff, equipment, etc. to conduct the study;

  4. Time restrictions: the amount of time needed to plan and conduct acensus-based survey may be excessive; and,

  5. Unknown target population size: if the study objective is to investigatethe presence of premalignant skin lesions in illicit drugs users, lackof information on all existing users makes it impossible to conduct acensus-based study.

All these reasons explain why samples are more frequently used. However,researchers must be aware that sample results can be affected by the randomerror (or sampling error).3 Toexemplify this concept, we will consider a research study aiming to estimate theprevalence of premalignant skin lesions (outcome) among individuals >18 yearsresiding in a specific city (target population). The city has a total populationof 4,000 adults, but the investigator decided to collect data on arepresentative sample of 400 participants, detecting an 8% prevalence ofpremalignant skin lesions. A week later, the researcher selects another sampleof 400 participants from the same target population to confirm the results, butthis time observes a 12% prevalence of premalignant skin lesions. Based on thesefindings, is it possible to assume that the prevalence of lesions increased fromthe first to the second week? The answer is probably not. Each time we select anew sample, it is very likely to obtain a different result. These fluctuationsare attributed to the "random error." They occur because individuals composingdifferent samples are not the same, even though they were selected from the sametarget population. Therefore, the parameters of interest may vary randomly fromone sample to another. Despite this fluctuation, if it were possible to obtain100 different samples of the same population, approximately 95 of them wouldprovide prevalence estimates very close to the real estimate in the targetpopulation - the value that we would observe if we investigated all the 4,000adults residing in the city. Thus, during the sample size estimation theinvestigator must specify in advance the highest or maximum acceptable randomerror value in the study. Most population-based studies use a random errorranging from 2 to 5 percentage points. Nevertheless, the researcher should beaware that the smaller the random error considered in the study, the larger therequired sample size.1

SAMPLE FRAME

The sample frame is the group of individuals that can be selected from the targetpopulation given the sampling process used in the study. For example, toidentify cases of cutaneous melanoma the researcher may consider to utilize assample frame the national cancer registry system or the anatomopathologicalrecords of skin biopsies. Given that the sample may represent only a portion ofthe target population, the researcher needs to examine carefully whether theselected sample frame fits the study objectives or hypotheses, and especially ifthere are strategies to overcome the sample frame limitations (see Chart 1 for examples and possiblelimitations).

Chart 1

Examples of sample frames and potential limitations as regardsrepresentativeness

Sample framesLimitations
Population census• If the census was not conducted inrecent years, areas with high migration might be outdated
• Homeless or itinerant people cannotbe represented
Hospital or HealthServices records• Usually include only data of affectedpeople (this is a limitation, depending on the studyobjectives)
• Depending on the service, data may beincomplete and/or outdated
• If the lists are from public units,results may differ from those who seek private services
School lists• School lists are currently availableonly in the public sector
• Children/ teenagers not attendingschool will not be represented
• Lists are quickly outdated
• There will be problems in areas withhigh percentage of school absenteeism
List of phone numbers• Several population groups are notrepresented: individuals with no phone line at home (low-incomefamilies, young people who use only cell phones), those whospend less time at home, etc.
Mailing lists• Individuals with multiple emailaddresses, which increase the chance of selection com­pared toindividuals with only one address
• Individuals without an email addressmay be different from those who have it, according to age,education, etc.

Open in a separate window

SAMPLING

Sampling can be defined as the process through which individuals or samplingunits are selected from the sample frame. The sampling strategy needs to bespecified in advance, given that the sampling method may affect the sample sizeestimation.1,5 Without a rigorous sampling plan the estimates derived fromthe study may be biased (selection bias).3

TYPES OF SAMPLING

In figure 1, we depict a summary of the mainsampling types. There are two major sampling types: probabilistic andnonprobabilistic.

Open in a separate window

Figure 1

Sampling types used in scientific studies

NONPROBABILISTIC SAMPLING

In the context of nonprobabilistic sampling, the likelihood of selecting someindividuals from the target population is null. This type of sampling does notrender a representative sample; therefore, the observed results are usually notgeneralizable to the target population. Still, unrepresentative samples may beuseful for some specific research objectives, and may help answer particularresearch questions, as well as contribute to the generation of newhypotheses.4 Thedifferent types of nonprobabilistic sampling are detailed below.

Convenience sampling: the participants are consecutivelyselected in order of apperance according to their convenient accessibility (alsoknown as consecutive sampling). The sampling process comes to an end when thetotal amount of participants (sample saturation) and/or the time limit (timesaturation) are reached. Randomized clinical trials are usually based onconvenience sampling. After sampling, participants are usually randomlyallocated to the intervention or control group (randomization).3 Although randomization is aprobabilistic process to obtain two comparable groups (treatment and control),the samples used in these studies are generally not representative of the targetpopulation.

Purposive sampling: this is used when a diverse sample isnecessary or the opinion of experts in a particular field is the topic ofinterest. This technique was used in the study by Roubille et al, in whichrecommendations for the treatment of comorbidities in patients with rheumatoidarthritis, psoriasis, and psoriatic arthritis were made based on the opinion ofa group of experts.6

Quota sampling: according to this sampling technique, thepopulation is first classified by characteristics such as gender, age, etc.Subsequently, sampling units are selected to complete each quota. For example,in the study by Larkin et al., the combination of vemurafenib and cobimetinibversus placebo was tested in patients with locally-advanced melanoma, stage IIICor IV, with BRAF mutation.7 Thestudy recruited 495 patients from 135 health centers located in severalcountries. In this type of study, each center has a "quota" of patients.

"Snowball" sampling: in this case, the researcher selects aninitial group of individuals. Then, these participants indicate other potentialmembers with similar characteristics to take part in the study. This isfrequently used in studies investigating special populations, for example, thoseincluding illicit drugs users, as was the case of the study byGonçalves et al, which assessed 27 users of cocaine and crack incombination with marijuana.8

PROBABILISTIC SAMPLING

In the context of probabilistic sampling, all units of the target population havea nonzero probability to take part in the study. If all participants are equallylikely to be selected in the study, equiprobabilistic sampling is being used,and the odds of being selected by the research team may be expressed by theformula: P=1/N, where P equals the probability of taking part in the study and Ncorresponds to the size of the target population. The main types ofprobabilistic sampling are described below.

Simple random sampling: in this case, we have a full list ofsample units or participants (sample basis), and we randomly select individualsusing a table of random numbers. An example is the study by Pimenta et al, inwhich the authors obtained a listing from the Health Department of all elderlyenrolled in the Family Health Strategy and, by simple random sampling, selecteda sample of 449 participants.9

Systematic random sampling: in this case, participants areselected from fixed intervals previously defined from a ranked list ofparticipants. For example, in the study of Kelbore et al, children who wereassisted at the Pediatric Dermatology Service were selected to evaluate factorsassociated with atopic dermatitis, selecting always the second child byconsulting order.10

Stratified sampling: in this type of sampling, the targetpopulation is first divided into separate strata. Then, samples are selectedwithin each stratum, either through simple or systematic sampling. The totalnumber of individuals to be selected in each stratum can be fixed orproportional to the size of each stratum. Each individual may be equally likelyto be selected to participate in the study. However, the fixed method usuallyinvolves the use of sampling weights in the statistical analysis (inverse of theprobability of selection or 1/P). An example is the study conducted in SouthAustralia to investigate factors associated with vitamin D deficiency inpreschool children. Using the national census as the sample frame, householdswere randomly selected in each stratum and all children in the age group ofinterest identified in the selected houses were investigated.11

Cluster sampling: in this type of probabilistic sampling, groupssuch as health facilities, schools, etc., are sampled. In the above-mentionedstudy, the selection of households is an example of cluster sampling.11

Complex or multi-stage sampling: This probabilistic samplingmethod combines different strategies in the selection of the sample units. Anexample is the study of Duquia et al. to assess the prevalence and factorsassociated with the use of sunscreen in adults. The sampling process includedtwo stages.12 Using the 2000Brazilian demographic census as sampling frame, all 404 census tracts fromPelotas (Southern Brazil) were listed in ascending order of family income. Asample of 120 tracts were systematically selected (first sampling stage units).In the second stage, 12 households in each of these census tract (secondsampling stage units) were systematically drawn. All adult residents in thesehouseholds were included in the study (third sampling stage units). All thesestages have to be considered in the statistical analysis to provide correctestimates.

NONRESPONDENTS

Frequently, sample sizes are increased by 10% to compensate for potentialnonresponses (refusals/losses).1 Let us imagine that in a study to assess the prevalence ofpremalignant skin lesions there is a higher percentage of nonrespondents amongmen (10%) than among women (1%). If the highest percentage of nonresponse occursbecause these men are not at home during the scheduled visits, and theseparticipants are more likely to be exposed to the sun, the number of skinlesions will be underestimated. For this reason, it is strongly recommended tocollect and describe some basic characteristics of nonrespondents (sex, age,etc.) so they can be compared to the respondents to evaluate whether the resultsmay have been affected by this systematic error.

Often, in study protocols, refusal to participate or sign the informed consent isconsidered an "exclusion criteria". However, this is not correct, as theseindividuals are eligible for the study and need to be reported as"nonrespondents".

SAMPLING METHOD ACCORDING TO THE TYPE OF STUDY

In general, clinical trials aim to obtain a hom*ogeneous sample which is notnecessarily representative of any target population. Clinical trials oftenrecruit those participants who are most likely to benefit from theintervention.3 Thus,the more strict criteria for inclusion and exclusion of subjects in clinicaltrials often make it difficult to locate participants: after verification of theeligibility criteria, just one out of ten possible candidates will enter thestudy. Therefore, clinical trials usually show limitations to generalize theresults to the entire population of patients with the disease, but only to thosewith similar characteristics to the sample included in the study. Thesepeculiarities in clinical trials justify the necessity of conducting amulticenter and/or global studiesto accelerate the recruitment rate and toreach, in a shorter time, the number of patients required for thestudy.13

In turn, in observational studies to build a solid sampling plan is importantbecause of the great heterogeneity usually observed in the target population.Therefore, this heterogeneity has to be also reflected in the sample. Across-sectional population-based study aiming to assess disease estimates oridentify risk factors often uses complex probabilistic sampling, because thesample representativeness is crucial. However, in a case-control study, we facethe challenge of selecting two different samples for the same study. One sampleis formed by the cases, which are identified based on the diagnosis of thedisease of interest. The other consists of controls, which need to berepresentative of the population that originated the cases. Improper selectionof control individuals may introduce selection bias in the results. Thus, theconcern with representativeness in this type of study is established based onthe relationship between cases and controls (comparability).

In cohort studies, individuals are recruited based on the exposure (exposed andunexposed subjects), and they are followed over time to evaluate the occurrenceof the outcome of interest. At baseline, the sample can be selected from arepresentative sample (population-based cohort studies) or a non-representativesample. However, in the successive follow-ups of the cohort member, studyparticipants must be a representative sample of those included in thebaseline.14,15 In this type of study, losses over time may cause follow-upbias.

CONCLUSION

Researchers need to decide during the planning stage of the study if they will workwith the entire target population or a sample. Working with a sample involvesdifferent steps, including sample size estimation, identification of the sampleframe, and selection of the sampling method to be adopted.

Footnotes

Financial Support: None.

*Study performed at Faculdade Meridional - Escola de Medicina (IMED) - Passo Fundo(RS), Brazil.

REFERENCES

1. Martínez-Mesa J, González-Chica DA, Bastos JL, Bonamigo RR, Duquia RP. Sample size: how many participants do I need in myresearch? An Bras Dermatol. 2014;89:609–615. [PMC free article] [PubMed] [Google Scholar]

2. Röhrig B, du Prel JB, Wachtlin D, Kwiecien R, Blettner M. Sample size calculation in clinical trials: part 13 of a serieson evaluation of scientific publications. Dtsch Arztebl Int. 2010;107:552–556. [PMC free article] [PubMed] [Google Scholar]

3. Suresh K, Thomas SV, Suresh G. Design, data analysis and sampling techniques for clinicalresearch. Ann Indian Acad Neurol. 2011;14:287–290. [PMC free article] [PubMed] [Google Scholar]

4. Rothman KJ, Gallacher JE, Hatch EE. Why representativeness should be avoided. Int J Epidemiol. 2013;42:1012–1014. [PMC free article] [PubMed] [Google Scholar]

5. Krause M, Lutz W, Boehnke JR. The role of sampling in clinical trial design. Psychother Res. 2011;21:243–251. [PubMed] [Google Scholar]

6. Roubille C, Richer V, Starnino T, McCourt C, McFarlane A, Fleming P, et al. Evidence-based Recommendations for the Management ofComorbidities in Rheumatoid Arthritis, Psoriasis, and Psoriatic Arthritis:Expert Opinion of the Canadian Dermatology-Rheumatology ComorbidityInitiative. J Rheumatol. 2015;42:1767–1780. [PubMed] [Google Scholar]

7. Larkin J, Ascierto PA, Dréno B, Atkinson V, Liszkay G, Maio M, et al. Combined vemurafenib and cobimetinib in BRAF-mutatedmelanoma. N Engl J Med. 2014;371:1867–1876. [PubMed] [Google Scholar]

8. Goncalves JR, Nappo SA. Factors that lead to the use of crack cocaine in combination withmarijuana in Brazil: a qualitative study. BMC Public Health. 2015;15:706–706. [PMC free article] [PubMed] [Google Scholar]

9. Pimenta FB, Pinho L, Silveira MF, Botelho AC. Factors associated with chronic diseases among the elderlyreceiving treatment under the Family Health Strategy. Cien Saude Colet. 2015;20:2489–2498. [PubMed] [Google Scholar]

10. Kelbore AG, Alemu W, Shumye A, Getachew S. Magnitude and associated factors of Atopic dermatitis amongchildren in Ayder referral hospital, Mekelle, Ethiopia. BMC Dermatol. 2015;15:15–15. [PMC free article] [PubMed] [Google Scholar]

11. Zhou SJ, Skeaff M, Makrides M, Gibson R. Vitamin D status and its predictors among pre-school children inAdelaide. J Paediatr Child Health. 2015;51:614–619. [PubMed] [Google Scholar]

12. Duquia RP, Menezes AM, Almeida HL, Jr, Reichert FF, Santos Ida S, Haack RL, et al. Prevalence of sun exposure and its associated factors in southernBrazil: a population-based study. An Bras Dermatol. 2013;88:554–561. [PMC free article] [PubMed] [Google Scholar]

13. Barrios CH, Werutsky G, Martinez-Mesa J. The global conduct of cancer clinical trials: challenges andopportunities. Am Soc Clin Oncol Educ Book. 2015:e132–e139. [PubMed] [Google Scholar]

14. Victora CG, Barros FC. Cohort profile: the 1982 Pelotas (Brazil) birth cohortstudy. Int J Epidemiol. 2006;35:237–242. [PubMed] [Google Scholar]

15. Boing AC, Peres KG, Boing AF, Hallal PC, Silva NN, Peres MA. EpiFloripa Health Survey: the methodological and operationalaspects behind the scenes. Rev Bras Epidemiol. 2014;17:147–162. [PubMed] [Google Scholar]

Articles from Anais Brasileiros de Dermatologia are provided here courtesy of Sociedade Brasileira de Dermatologia

Sampling: how to select participants in my research study? (2024)
Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 5830

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.