So this scientist has taken a fall for rather questionable methods in directing scientific experiments.
So what has this to do with security? Well, it sorta falls into the realm of integrity. Kinda like fake news.
And what's wrong with what he was doing? After all, we teach about data warehousing, right? You got a bunch of data: what's wrong with using it to learn things aside from what you originally thought you were going to learn? That's sort of OK if you don't stray too far, but, at some point, you get into the realms of "shoot first: draw the target afterwards."
Poor guy. He’d pretty much turned his targeting computer off and was using the force...
My favourite quote from the article is:
‘Wansink encouraged his students to dig through the numbers to find results that would "go virally big time."’
Not keeping enough of the experimental/survey data seems to have led to his inability to corroborate his papers.
I’d draw the line at saying it’s fake news, as from the reading I think it all happened, but it seems more sciency than science, and well, It seems Cornell was biased against the former conceptoid...
@Early_Adopter wrote:My favourite quote from the article is:
‘Wansink encouraged his students to dig through the numbers to find results that would "go virally big time."’
Considered in isolation, there is nothing wrong with selecting research projects that will have academic, scientific, and public impact. That last aspect of public impact can potentially be measured by how broadly the results 'go viral.' However, Wansick's error appears to be selectively culling the data he and his students analyzed, instead of using the entire data set, and also mis-using the data by not ensuring it was proper for the questions being analyzed. I read the entire article and then several articles on "p-hacking," the name given for the improper methods he used. (I recommend readers do the same.) P-hacking is essentially throwing out parts of the data set in order to reach a foregone conclusion; that is a BIG error in the scientific process.
Further, using previously collected data for new investigations is perfectly proper, as long as the target population, means of sampling, and nature of the data collection process all comport fully with the new investigation. There is absolutely no reasons to insist that every data analysis to test new hypotheses have a new and unique data collection step.
@Early_Adopter wrote:Not keeping enough of the experimental/survey data seems to have led to his inability to corroborate his papers.
The scientific community has recognized a challenge in validating prior research, caused by the difficulty in obtaining funds to replicate someone else's prior research before building on that work with new investigations. Part of the solution has become a move to maintain the original raw data and make it available for others to replicate the analysis, and in some cases, re-use the data set for testing new hypotheses.
Yeah, i took the ‘dig through the numbers to find...’ to indicate selection of something out of the sample that proved the favoured hypothesis, or at least would be popular. It sounded analogous to cherry picking to support an argument.
I’d something in shooting ‘chasing the error around the target’. This is essentially caused by discounting ‘bad’ groups when trying to ‘zero’ an inconsistent shot. The instructor thinks that the tries to decide which group to use as they are firing and corrects the sights - and the zero is no good. It’ll normally be down to the stability of base, flinching or snatching but it will take longer to diagnose if you select only things that look like a tight group - if you take all the groups together the distribution is in the center of mass, and you’ll see you need to detach the principles of marksmanship, and probably do a bit of coaching.
If he’d have had his raw data I guess he could have re-run things/got someone else to do so, and perhaps the results would have been ok, or as you say re-run things and at least drawn a new conclusion.
To your point on data I recently met a chap from an university who handled storage. He told me that they had to just keep all of the data from reaserchers - it was in petabytes, and they just keep adding boxes, quite often people would not be contactable afterwards and keeping it all is quite expensive, at some point the problem is going to get too big, and then at least. A portion of it will be lost.