Saturated-Fat Epidemiology

Here, at Free the Animal, are three scatterplots that show better health (less heart disease, less stroke) correlated with more saturated fat (= animal fat) in the diet. Each point is a different European country (Albania, Bulgaria, etc.). Small and large countries show the same relationship.

The obvious confounding is with wealth — rich people eat more meat than poor people. Were this data submitted for publication, I imagine someone would say how dare you fail account for that! and reject the paper. That would be a mistake. Because it is hard to look at this data and continue to think that saturated fat is the evil it is made out to be. And of course whatever the weaknesses of my sleep/fat experiment (which showed animal fat improved my sleep), confounding with wealth was not one of them.

23 Replies to “Saturated-Fat Epidemiology”

  1. A journal would be right to reject a paper based on this data that did not control for wealth. An analysis that accounted for wealth would be much more informative, and it wouldn’t be hard to do. GDP per capita data are even available on Wikipedia – it shouldn’t even take an hour to add them to your spreadsheet and run a few regressions (I might have done it myself, if he’d posted a table of his data). And it’s important to do – lots of spurious correlations come up in between-countries data, since so many things are correlated with each other. Taking a look at his graph which names the countries and puts them in order of saturated fat consumption, the correlation between sat-fat% and wealth looks to be extremely high, and I wouldn’t be surprised if the correlation between saturated fat and health goes away completely if you control for wealth. Some of the outliers on his graph, like Israel and Turkmenistan, jump out as countries where wealth doesn’t match saturated fat consumption.

  2. I slept pretty damn well living in Argentina. Of course there are a lot of variables to account for, but I had a large bife de chorizo (flank steak) almost every day. The cuts of meat there were loaded with fat. Juicy, buttery, mouth-watering fat that.

    Now I can’t get good cuts anywhere. I compromise with daily use of coconut oil.

  3. Vince, there is such a thing as overcorrection. X and Y and Z are all correlated, but when you control for Z the correlation between X and Y goes away. Not because the correlation is spurious but because X causes both Y and Z.

    It is foolish to look at a data set and start by asking how it might be misleading. It is better to start by asking what can be learned from it.

  4. Seth, I’d guess that wealth has a big impact on diet and on health, and that the effect of diet on health is smaller (especially if you’re only looking at a single nutrient). That means that it’s hard to know what the data set is telling you if you just look at the correlation between diet and health. I favor looking at a data set in the way that’s most likely to give meaningful results, adding in more data if that’s helpful and practicable, and being careful in considering whether some ways of analyzing a data set are likely to be misleading.

    I decided to run the numbers myself, seeing if this relationship between saturated fat consumption and health holds up after controlling for wealth. It doesn’t. The raw correlation between Saturated Fat Consumption as a Percentage of Total Calories (SatFat%) and Disability Adjusted Lost Years (DALY) is large, r = -.69 (p less than .0001), but after controlling for GDP Per Capita (GDP/person) it basically becomes zero (it actually reverses direction, with more saturated fat associated with worse health outcomes, but it’s nowhere close to statistically significant).

    All the numbers that I needed except for GDP per capita were in Alex’s first post. His first graph has labels with the SatFat% for 45 countries, and his last graph has labels with the DALY figures for those countries*. I copied those numbers into a spreadsheet, and then added the GDP Per Capita numbers (from the IMF, 2008). If you’re interested, I could email you my data set.

    GDP/person correlated strongly with both SatFat% (r = .86) and DALY (r = -.84). And in a linear regression predicting DALY from both SatFat% and GDP/person, only GDP/person was a significant predictor (F(1,42) = 33.9, p less than .0001); SatFat% was nowhere near significant (F(1,42) = .49, p = .49). This model explains 71% of the variance in DALY (R^2 = .710). The regression equation says that every additional $1000 in GDP/person is associated with 98 fewer DALYs, and each 1 percentage point increase in SatFat% is associated with 54 additional DALYs (but this is not significantly different from zero).

    I tried a few variations on this analysis, seeing if transforming the variables made for a better model. Using the log of DALY instead of DALY is an improvement: the pairwise correlations are stronger (-.75 and -.89 instead of -.69 and -.84), R^2 for the regression model goes up (.787 instead of .710), and Norway is no longer predicted to have a negative number of DALY. The pattern of regression results remains the same: GDP/person is a strong predictor of log(DALY) (F = 43.7), and SatFat% has a very slight positive association with log(DALY) that is not close to being statistically significant (F = .05, p = .83).

    These analyses suggest that wealth (or something closely correlated with wealth) has a big impact on saturated fat consumption and on health, and that whatever impact saturated fat consumption has on health is too small to show up in this data set.

    *Two countries, Moldova and Macedonia, were missing from the last graph, so I used their approximate DALY from the first graph, which plots DALY on the y-axis but doesn’t label the exact number.

  5. “These analyses suggest that wealth (or something closely correlated with wealth) has a big impact on saturated fat consumption and on health, and that whatever impact saturated fat consumption has on health is too small to show up in this data set.”

    Vince, thanks for the further analyses. Here’s an alternative explanation of your results: When people make more money, they eat more saturated fat. The increase in saturated fat intake improves their health. Perhaps this is what the data set is telling us. Perhaps it is saturated fat intake that is the “something closely correlated with wealth” that has a “big impact” on health.

    One way to look at it is this. Suppose I have a theory: more saturated fat –> better health. This data set could have contradicted that theory; the raw correlation could have been zero or negative. By passing a test, my theory gains credence. How much credence it gains depends on what else you believe. But to say it gains zero credence, as you seem to , is to place too much faith in your beliefs. You’d have to know what that “something closely correlated with wealth” is — which obviously no one knows.

    It’s fine to notice the glass is half empty but to fail to notice it’s half full is to miss the main point. When you learned correlational data analysis, no one ever discussed overcorrection? Pointing out the alternative interpretation that I just pointed out?

  6. If I have a theory that one feature of rich countries causes another feature of rich countries, and I look at the correlation between those two variables, then it’s a pretty safe bet that the correlation is going to be there regardless of whether or not my theory is correct. Wealth, health, education, energy usage, meat consumption, political freedom, gender equality, age at first marriage, and a bunch of other variables should all be correlated with each other (looking just at the raw correlations), and that tells us very little about the causal relationships between them. Once you know that X is a feature of wealthy countries, there really isn’t much chance of failing to find a raw correlation between X and another feature of wealthy countries, so that correlation isn’t very informative as a test of a theory. You need to do other analyses to start to tease apart the relationships between the variables in order to gain (or lose) credence in the theory.

    If saturated fat causes substantial improvements in health, then for equally rich countries we would expect the ones that eat more saturated fat to be healthier. That’s what the regression looks at – holding wealth constant, are countries that eat more saturated fat healthier? The answer, in this data set, is that they aren’t. If two countries are equally rich, and the people of one country eat much more saturated fat than the other, the best guess is that they are equally healthy (if anything, you’d be better off guessing that the one that eats more saturated fat is slightly less healthy). If saturated fat consumption explained the relationship between wealth and health, we’d expect it to remain significant in the regression – we might even expect wealth to become nonsignificant, since if wealth only matters as a way to increase a country’s saturated fat intake, once you know how much saturated fat they eat then knowing their wealth won’t tell you anything useful. (That’s why psychologists commonly use regression to test for mediation.) So I don’t think that the data fit your alternative explanation.

    None of this means that your theory is wrong – there are plenty of reasons why we might find no relationship between saturated fat consumption and health in the regression even if saturated fat does improve (or worsen) health. Maybe the effect of saturated fat on health isn’t big enough (relative to the other factors that influence a country’s health) to explain a significant amount of the variability between countries, but it still has enough of an effect to be important to individuals. Maybe there are other relevant variables that we aren’t accounting for (e.g. countries that eat more saturated fat may also tend to eat more X, and X could be good or bad for people’s health). Maybe we aren’t using the most relevant measure of health (saturated fats could make people healthier or less healthy in some way that doesn’t show up in the DALYs), or we aren’t using the most relevant measure of saturated fat consumption (maybe total saturated fat eaten is more important than the percentage of your calories that come from saturated fat). None of these, though, are good reasons to favor the raw correlation over the regression results, or to favor your theory over the standard theory (that saturated fats are harmful). They could be good reasons to pay more attention to your self-experimentation and other individual-level data instead of country-level data.

  7. Before I saw this data set I knew that people in rich countries eat more meat. But I didn’t know that they had less heart disease and less stroke. That’s why this data set is informative. I have no idea how you could have known that to be true, as you seem to be saying. Lots of diseases become more common with wealth. There’s a whole category just for them: diseases of civilization.

    “If saturated fat consumption explained the relationship between wealth and health, we’d expect it to remain significant in the regression.” Where in the world did you get this idea? “Significant” is a highly arbitrary criterion, as I’m sure you know. There is error in everything, as I’m sure you know. Have you ever encountered a discussion of overcorrection?

    And the bigger question is: What do you think can be learned from this data set?

  8. Vince,

    Would you mind sharing your data set, perhaps posting it as a Google-Docs spreadsheet? I’d like to take a look at the set, without duplicating effort if possible. Thanks!

    Cheers,
    Tom

  9. Tom, here’s the data set on Google Docs (hope that worked).

    Seth, here’s my general take on using this data set to test your theory. We know that there are many ways in which richer, more developed countries differ from poorer, less developed countries. So when you’re interested in looking at some variable between countries, it’s important to check whether it’s one of the many variables that are associated with development level. If it is, then it will be correlated with the other variables that go with development level, and those correlations (on their own) won’t be very informative about the causal relationship between a specific pair of variables since they just show that both are part of the same package (development). As a first, simple step to see if two specific variables are causally related, you can run a regression controlling for some measure of development (such as GDP per capita) to see if it still holds up – are they related to each other beyond what you’d expect from them both being features of developed countries?

    What can be learned from this data set? If saturated fat had been a significant predictor of health then that would have been some (fairly weak) evidence in favor of the theory that saturated fat is beneficial to your health (or harmful, if it came out in the other direction). Instead, it turned out that there was basically no relationship between saturated fat and health in the regression, which I take as (fairly weak) evidence that differences in saturated fat consumption do not play a big role in causing those health outcomes, compared to other factors that differ between countries. Further analyses could challenge that interpretation. You’re right that there’s nothing magical about the .05 cutoff for significance, and a relationship in the predicted direction that isn’t quite statistically significant but is big enough to be practically meaningful could still count as (weak) evidence in favor of the theory. In this case, arbitrary significance cutoffs aren’t the issue – the relationship between saturated fats and health in the regression is in the wrong direction for your theory, and in the analysis that seems to fit the data best (predicting log(DALY)) it is very close to zero and very far from statistical significance (p = .83).

    To answer your specific statistical questions, I’ve learned about various potential problems with regression, including overfitting and multicollinearity, but I haven’t studied “overcorrection” under that name (and a quick google search doesn’t help). The idea that “If saturated fat consumption explained the relationship between wealth and health, we’d expect it to remain significant in the regression” basically comes straight from the standard Baron & Kenny take on mediation. Some patterns of results might suggest that there’s just not enough power for it cross the threshold of statistical significance, but that does not seem to be what’s happening in this case.

  10. Vince, thanks for answering my questions. Baron & Kenny contains a whopper of a mistake. I’ll be curious to see where they got this strange idea. This might be a good example for my statistics column.

    If we knew that wealth affects health without saturated fat having anything to with that connection (between wealth and health), then it would be sensible to adjust for wealth and see what remains. But that isn’t true. We don’t know how wealth affects health. Having more money in your pocket or bank account doesn’t automatically make you healthier. Having more money changes behavior — those changes are what make the difference in health. One effect of having more money is that you buy and eat more meat. This change may produce a large part of the wealth-health correlation.

    I guess most epidemiologists are unfamiliar with the concept of overcorrection. It is related to multicollinearity, of course, but I gather they haven’t understood how it can mislead them.

  11. In India, best health indicators are in the state of Kerala where they use a lot of coconut oil Indicators like maternal death, infant mortility etc are lower in Kerala despite Kerala being less wealthy than other Indian States.
    Sri Lanka also is high in coconut oil usage and health indicators.
    Economists usually explain (away) this as owing to greater economic equality in Kerala but I dont buy this.

  12. Seth:

    Re coconut oil, you might be interested in the Tokelau Island Migrant Study. Dr. Stephan Guyenet at Whole Health Source did a whole series on this study, and, byt the way, Tokelauans traditionally got almost 50% of total energy from saturated fat (coconut fat is about 90% saturated).

    Take a guess about their health. Take another guess about the health of those who migrated to NZ and began eating neolithic foods.

    I blogged about Stephan’s series here, with all the links in one place.

    http://freetheanimal.com/2009/01/saturated-fat.html

    Also, at the top of the blog, I just began a new series on sat fat & heart disease, springing off of some strong claims by a renowned epidemiologist in New Zealand.

  13. In response to Seth’s comment from last night:

    I have no idea what epidemiologists are familiar with. My statistical training is in psychology.

    Here’s Kenny’s website on the Baron & Kenny approach to mediation (with some updates since their oft-cited 1986 paper).

    I don’t think that we need to know that wealth affects health. I’ve been thinking of GDP per capita as a proxy for level of development, so controlling for it is a way to test whether the correlation between saturated fat consumption & health is just due to the fact that they both go along with higher levels of development or if there’s a more direct relationship between them. Do you disagree with any of these 5 points?

    1. The health measures that we’ve been looking at improve at higher levels of development.
    2. Saturated fat consumption increases with higher levels of development.
    3. 1 & 2 don’t provide much support for the theory that saturated fat consumption causes health improvements.
    4. The combination of 1 & 2 implies that saturated fat consumption will be correlated with those health measures, so a correlation between them does not provide any additional information.
    5. The regression (controlling for GDP/person) does not provide any support for the theory that saturated fat consumption causes health improvements.

    Put those together, and it looks like we don’t have much support for your theory at the country level. Also, I don’t think that any of these points depend on the regression methodology issues that we’ve been debating.

  14. Vince, we don’t know how level of development affects health. (The effect is surely complex, since there are both diseases of affluence and diseases of poverty.) Forgive the emphasis, but you seem to keep missing this point. Until we do, there is danger of overcorrection. Perhaps greater development reduces heart disease and stroke (in this data set) because it causes more saturated fat consumption. if so, then if we “correct” for level of development we will thereby remove some or all of the effect of saturated fat on health. If we then — as you did — correlate saturated fat and the residuals (health after the “effect” of level of development has been taken out) and find zero correlation, we are fooling ourselves if we take this to mean saturated fat is unimportant.

    In case that isn’t clear, let me tell you an example that might be clearer. Suppose Little League participation improves a child’s fitness. We know this, it’s been measured. Someone comes along and finds that running laps improves fitness. A strong correlation. It so happens that Little League participation involves running laps because that’s one thing coaches do: make their players run laps.  And most kids run laps only if they are in Little League. If an epidemiologist came along and factored out Little League participation from the running laps/fitness correlation, and thereby concluded that running laps doesn’t affect fitness, we’d all understand the epidemiologist was missing something.

  15. Seth, there are two questions here. First, do these data provide support for your hypothesis (sat fat is good for your health)? Second, what do these data suggest? My answer to the first is that they do not (or, at best, they provide very little support). My last comment tried to focus on that first question (these correlations don’t provide much support for the theory), but it looks like you’re mainly addressing the second.

    To put the focus back on the first question, is there a desirable health outcome that is associated with development that you are fairly confident is not caused by saturated fat consumption? Maybe infant mortality? Because I’d guess that if we ran the same analyses predicting that health outcome then we’d fine a similar pattern of results to what we find with heart disease & stroke. Which would suggest that the results don’t provide evidence of a causal relationship.

  16. Vince, I agree that other data might come along that would change how persuasive I find these data. If someone finds that more X (not saturated fat) reduces heart disease (causality is established), and that greater development is correlated with more X, then yes, this data would become less persuasive that saturated fat is healthy.

    In reply to your question — is there some health benefit correlated with increasing development that I’m “fairly confident” isn’t due to greater saturated fat consumption? — the answer is yes, better vision. (Due to more access to glasses.) I hope you can see that this sort of thing doesn’t help. Sure, correlations can be misleading. Sure, if we look hard enough we can find an example. They can also be informative.

  17. Vince, Seth is right about the potential for “overcorrection,” which I would call plain confounding. The idea becomes clear when you think in terms of causality instead of statistics. Let’s say we have the following, simple (causal) model, represented as a directed graph:

    SatFat –> Health

    where the arrow –> denotes that saturated fat *causes* some effect on health. We are not saying what that effect is (positive or negative), or whether it is large enough to matter. All we are saying is that in our model, saturated fat has causal influence over health. We are also claiming, by virtue of our arrow not being bidirected or pointed the other way, that health does not cause saturated-fat consumption. (Here our domain knowledge comes into play to rule out an absurdity.)

    Now, if we were confident that there were no other factors that could affect the relationship between saturated fat and health *causally*, we would consider the relationship to be identified, and we could estimate it directly from our observational data. But in this case we have no such confidence: we can think of many factors that could exert causal influence on both saturated fat and health, wealth for example.

    Let us therefore update our causal model to reflect that we expect wealth (as a proxy for societal development) to have some causal effect on health, and also upon saturated-fat consumption (meat being expensive). To Seth’s point, we don’t know what this effect is, but knowing that it probably exists is enough for us to be obliged to update our model:

    SatFat –> Health
    Wealth –> SatFat
    Wealth –> Health

    (At this point, you might want to draw the directed graph on paper, arranging the three nodes SatFat, Health, and Wealth as the points of a triangle whose sides are given by the arrows above.)

    But still, we must recognize our model as incomplete. The effect of wealth on health, for example, is almost certainly not direct. Giving people money doesn’t make them healthier. The effect is likely to be mediated by other factors such as health care, which converts money into health. But for now we can omit such factors for simplicity.

    What we cannot omit is the possibility of other factors that affect both wealth and health (e.g., government policy, social stratification) or both wealth and saturated-fat consumption (e.g., religious beliefs, geography). To represent the potential for these unaccounted-for causal influences, we can introduce mystery edges to our model:

    SatFat –> Health
    Wealth –> SatFat
    Wealth –> Health
    Wealth SatFat
    Wealth Health

    where X Y denotes that an external factor (or factors) influences both X and Y.

    Now, here’s where it gets interesting. One of the recent findings of causality theory [1] is that you can use the information encoded in a directed graph to determine which causal relationships are identified, that is, can be determined from observational (nonexperimental) data. I won’t go into all the background, but in a graph like the one above, if you condition on a variable such as wealth, you will confound the relationship between saturated fat and health.

    For a short explanation of this phenomenon, consider the following 3-node graph:

    X –> Z Health
    SatFat Wealth Health

    Thus conditioning on wealth blocks the first path but opens the second; not conditioning leaves the first open. What this means is that we will need a more sophisticated model (and more corresponding observational data) if we want to ferret out the causal effect of saturated fat on health. (I’m working on some of these models in my spare time. If I come up with anything interesting, I’ll be sure to share my findings.)

    Cheers,
    Tom

    [1] http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf
    [2] http://www.andrew.cmu.edu/user/scheines/tutor/d-sep.html

  18. Sorry, folks. My previous post, if you tried to read it, gets hopelessly disconnected about halfway through. It seems that WordPress’s markup filter has deleted a few of its paragraphs and mangled a few of its diagrams. (Lesson: don’t draw diagrams using symbols that WordPress is likely to mistake for HTML.)

    I have posted the original text here:

    http://community.moertel.com/~thor/blog/seths-blog-satfat-causality.txt

    It should be much, much easier to understand. 😉

    Cheers,
    Tom

  19. Protein is the factor that makes the picture more complete.
    Poor people eat more of a percentage of animal fat calories than animal protein calories. Eat fat not protein {After 60 about 6% seems to be right but more testing is needed}

    Protein is needed for health. Too much or too little can kill or cause illness.

    http://www.impactaging.com/papers/v1/n10/full/100098.html

    The righ percentage for someone 60 or over can add as many healthy years as caloric restriction with better overall health and appearance.

Comments are closed.