Is it dishonest to remove outliers and/or transform data?

 

 

 

Почем лучок пучок,

There is no dishonesty in removing an outlier if that “participant” didn’t follow instructions and just tapped one button repeatedly. If it clearly affects the mean greatly and do not represent true scores then it should be removed. Although there are reasons to remove outliers or transform data, some may do it without a decent reason. Say, you ran research – it took many years, lots of money at stake, career and reputation and tons of pressure to find significant result. Some may falsify the data; hide certain aspects of results to present significant or at least desirable results. Of course, it is hypothetical again. People won’t do it. But it happens, because science is a career-driven discipline.  Good reputation provides researchers with funding and ongoing support. Check this guy out who fell to the dark side of conducting research

http://news.bbc.co.uk/1/hi/world/asia-pacific/4554704.stm

For such reasons it is dishonest to transform data. But there are legitimate reasons for removing outliers. For example, not following instructions as I mentioned previously or when participants didn’t understand instructions, withdrew from the experiment or some other factors that I can’t think about now. Nevertheless, you have to be very careful deciding whether remove outliers or not, because outliers might occur due to measurement error or simply by chance. It should be thoroughly examined whether after removing the outlier the sample still represents the population, how greatly it affects the results and so on.

It is not dishonest to remove the outlier if there are thoroughly considered decisions, well-weighted reasons behind that, but it’s unfair if there are no such things 🙂

For every move you make there is a certain reason.

Explore posts in the same categories: Uncategorized

9 Comments on “Is it dishonest to remove outliers and/or transform data?”

  1. nim2152 Says:

    But wait! You forgot to mention that not only do you need a good reason to take out this outlier.. but you also need evidence you to back up your reason. – If that makes sense =/

  2. kmusial Says:

    You have mentioned whether it is dishonest to remove outliers and what I find interesting is that you have mentioned the fact that science is career based discipline. Yes I agree with you that especially when the findings are limited and the money for their research comes from private money people can try and manipulate their data just to get the result they are looking for. In fact the research of Mark Suster suggests that almost 75% of statistics in media are manipulated to some extent or even made up (http://www.bothsidesofthetable.com/2010/02/14/73-6-of-all-statistics-are-made-up/).


  3. “…after removing the outlier the sample still represents the population, how greatly it affects the results and so on.”

    Do you not think that outliers are removed in order to accurately predict the population. From what I gather, you are suggesting that the sample would already be representative or the population, even before the outlier has been removed.

    Secondly, the results of an experiment should follow the data. If your results change significantly due to the removal of an outlier, then so be it, because cleaning up your data should make you analysis more accurate anyway.

  4. psychrsjb Says:

    One thing is clear from your blog is that removing outliers can have great affects on the data and that the method and justifications for removing outliers are not always honest and it does lead to a battle of who has more money.

    There is no standard method for removing outliers and it is ultimately down to the discretion of the person doing the experiment (http://www.statsoft.com/textbook/basic-statistics/#Correlationse). This in itself is obviously bias because the person doing the experiment has a vested interest in the results being significant. Therefore I think it would be prudent to try and develop a standardisation of method for removing outliers. Very difficult of course because every experiment is different, and saying this there are guidelines from the APA. But perhaps a more stringent method could include a clause that means an independent experimenter ensures the outliers are being removed accurately.


  5. You have made a concise answer as to your views upon the action that should be taken in response to the identification of an outlier.

    However the motives for the not removing outliers are not necessarily the researcher wishing to enhance themselves in the field of science. Perhaps due to ignorances or carelessness these outliers are missed.

    In my opinion if an said example in question is an extreme score that causes a large change in the mean, causing the distribution to longer to be consider normal (a bell-curve). Then it should be considered acceptable to remove it.
    However there are also several ways in which someone can go about dealing with outliers with out having to be concerned greatly by the outliers. Using the median instead of the mean can avoid the effect of extreme scores, or running a non-parametric test which does not assume normality. Or a final alternative would be to use a non-linear transformation; which involves increasing or decreasing linear relationships between variables and, thus changes the correlation between variables.

  6. psud0b Says:

    I don’t think that an outlier should be removed just because it doesn’t ‘fit’ with the rest of the data, as there may be a legitimate reason for this! I think that if an outlier is removed, it should be because (as you suggested) the participant didn’t follow/understand instructions or for whatever reason didn’t complete the task as they should have done (maybe they were bored in a SONA study and just pressed one button repeatedly until they could leave). At this point, you should be able to back up your reasoning for removing an outlier – if you measured reaction times and the participant’s RT was so fast they couldn’t have possibly been responding to the stimuli and not taking part in the study properly, this will be reflected in their raw scores and you will have evidence to back up your decision to remove them. if, however, your outlier’s data seems perfectly normal, but just doesn’t fit with the rest of the participants’ data, you have no reason to remove their data and should instead focus on coming up with an explanation for why this participant may have been an outlier. this explanation could then spark further research, in which you could try to explain why some participants may have scores so different to others. it is also possible that, if you have several outliers, there has been some error in measurement, in which case you should further investigate the problem before removing outliers. in conclusion, outliers should only be removed with good, sound reasoning which can be fully supported by the raw data. Removing outliers to make your data seem ‘better’ or get a significant result would, in my opinion, be considered dishonest.


Leave a reply to psud0b Cancel reply