Over the course of almost three decades, I have been a devoted student of two things: research methods and doctoral education in business. The former has been a passion of mine since the early days of my doctoral studies because it all just seemed to make so much sense. Follow the procedure, understand the relationship of things, get the answer, move on. Research methods are logical, procedural and generalizable. All the important checkboxes for a natural born positivist like myself. I even grew to appreciate that research methodologies had something for everyone – the positivist, the interpretivist, the critical thinker – everyone. I am still learning but I’m starting to feel pretty comfortable about what I have learned. Comfortable enough, at least, to teach others to learn to love them, too.
The latter, doctoral education in business, has been an equal passion partly because of my devotion to business as a discipline but more because of its unique nature. Of all the educational channels and endeavors that exist for us in a college of business, doctoral education is truly the only one that is essentially handcrafted from beginning to end. Whether it be a PhD or a DBA program, the education process is better envisioned as an apprenticeship rather than a conventional student-teacher relationship and it is often different for each individual seeking the degree. So, when you bring together my two great intellectual passions, I get particularly excited. That is why I teach research methods at the doctoral level.
During this life-long endeavor, I find students struggle with a few concepts that necessarily take a bit of explaining. Validity, for example is a common struggle for business doctoral students. Experimental design comes easy to some and just plain confounds others. But the one that is the universal struggle for all business doctoral students is the difference between a control variable and a moderating variable. I spend a lot of time discussing these concepts with my doctoral students, so I thought it best just to write it all down.
I mean they both control so what’s the difference? It’s easy when you understand. So, let’s understand.
The Control Variable
In its simplest definition, a control variable is variable (think something that can vary) that is controlled for in some manner during a research study. Often, the method we use to control for this control variable is to hold it constant – just prevent it from varying. It is also likely a variable that is not of particular interest to our study but could arguably affect what we are interested in studying. In other words, it could influence the outcome of our study.
Why do control variables matter? They matter because control variables serve to enhance the internal validity of a study by making sure that nothing gets in the way of what effect or relationship we are trying to observe. This helps you establish the hypothesized causal relationship between your variables of interest – which is why you are doing all this.
Different from the independent and dependent variables that you want to vary as much as possible, everything else that could affect your results should be considered and controlled. If you don’t control the stuff that doesn’t matter to you, that same stuff may suddenly matter, and you may not be able to demonstrate that they didn’t influence your results.
In many of the hard sciences, a control variable is a variable that can actually be controlled. Holding temperature or light or moisture or whatever at a specific level to keep it from affecting the observed results are some basic examples. There are many of these such variables in the hard sciences and the scientists have become adept at developing methods of controlling them.
In the behavioral or social sciences however, we regularly encounter variables that need to be controlled even though we don’t have any real control over them. Things like age, gender, socio-economic status, culture, and personality, among many others are commonly encountered by behavioral researchers. How do we hold variables like these constant? We can’t force a person to be a certain age or gender.
Fear not, we social scientists have become pretty adept at controlling variables, too. We just must approach the problem from a different perspective. In our world, we have two primary choices for control: sampling and statistics.
Controlling by Sample
Let’s say we are interested in testing the perception of a new birth control pill that is coming on the market. It doesn’t take too long to begin seeing variables that could affect such perceptions – age and gender are two that may come to mind. If you are too young or too old to get pregnant, then you likely could care less about this new product. If you can’t get pregnant because of your gender (read, guys), then we really don’t care what you think about our product. So, how do we control for age and gender in our survey study? We simply sample it away.
We create a sample frame that is focused on our population of interest – women who are of child-bearing age. If you are a male, you won’t get the survey. If you are too young or too old to have children, you won’t get the survey. We have controlled for variables that could affect the perception of our new product by sampling them away.
Controlling by Statistics
Suppose we want to test the effectiveness of a new vitamin supplement on mental acuity and alertness. Definitively there are some variables we need to control for here – age, prior level of mental acuity, among many others. What we are truly interested in, however is what else could affect mental acuity and alertness – your diet, your sleep, your meal timing, your time watching TV or playing with your phone, and how much coffee you drink, just to name a few.
It is possible to conceive controlling for sleep or coffee intake or TV time, but not easily. We could standardize everyone’s diet, meal timing, TV watching, and coffee drinking by locking them in a room but that is both silly and impractical except in rigorously controlled experiments. Instead, we control for all these things by simply measuring as many of them we can think of and then including them in a regression model along with our independent variables of interest. Using this approach, we can see the relative effect of the control variables on our dependent variable and then subtract that effect from the regression model to see what is left. If we have done this correctly and with care, we will be able to see what we want to see without worrying about what we do not want to see.
The exact steps and procedures for accomplishing this are beyond the scope of this discourse but any good statistics book will guide you in the process.
Ok, so we understand the fundamentals of control variables and how to deal with them. But what if a variable presents the possibility of affecting a relationship between an independent and a dependent variable instead of affecting the value of another variable? These are a special variable referred to as a moderating or moderator variable. Instead of the arrow in your model pointing to another variable, the arrow coming from a moderating variable points to another arrow – in other words, the effect is on the relationship instead of on the variable itself.
Moderating variables can be confused with control variables because they are often the same variable. Age, gender, culture, socio-economic status, personality, marital status, education level, ethnicity, etc. can be either control variables or moderating variables. You decide which is which by asking yourself whether the effect of the variable in question will be directly on another variable in your model or on a relationship in your model. If it is the former, then it is a control variable. If the latter, it is a moderator. Once you make the decision, you then model the variable appropriately and test it just like any other relationship in your model. Remember, every arrow in your model is a hypothesis.
So, control variables are measured but their effect is not hypothesized. Moderator variables are measured, and their effect is hypothesized.
Time for an example.
Suppose we are interested in a person’s perceived level of stress in their life on their resting heart rate. We hypothesize that subjects who perceive high levels of stress in their life will have a higher resting heart rate than those who perceive relatively low levels of stress. Time to think about control and moderating variables.
What else could affect a person’s resting heart rate? The amount of exercise they get, their age, gender, race, and maybe many others. So, are these candidates control variables or moderators?
One could argue them either way. You could consider age as a control variable, measure it, and statistically remove it from the regression. You could do the same with gender. The problem with doing this is that anyone at any age and gender can have perceived stress in their life. So, it seems more reasonable that age or gender could alter the effect of stress on resting heart rate rather than directly affect a person’s resting heart rate. Therefore, we make the decision to model age and gender as moderating variables and hypothesize their effect on the relationship between perceived stress and resting heart rate. We then proceed to test them like any other hypothesis in our model.
Moderating variables are important because they often serve to help us see beyond what is happening between two variables by seeing why it is happening or why it isn’t happening. They provide insight into a relationship we could not otherwise achieve without their inclusion in the model.
This explanation of the difference between control variables and moderating variables is admittedly not very deep but it is intended to provide a basic explanation of each and how to tell them apart.
Once you understand the fundamental differences, the nuances associated with modeling and testing them become less onerous and easier to deal with.
So, for the sake of simplicity, think of moderating variables as catalysts which serve to make a relationship stronger or weaker. Control variables are variables that you are not particularly interested in, but they can have a material effect on your dependent variable and you wish to eliminate their effects from the effect of your independent variable of interest on your dependent variable.
It’s always easier when you understand.