Finally done here with my experience and I wish I had more time keep researching so that I have something a little more "finalised" to present. But I guess that's what past EXP kids meant when they said that 8 weeks of research is not enough and I'll have to work with what I've got.
To solve the problem of not having enough data points, we used the online TCGA database for raw data that would be used to calculate mutation rates. Mutation rates were calculated through an R coding script that Dr. Cannataro had made. Because the mutation rates were tumor specific, we had to change the proportions that were obtained from the IARC database using data from another database called cBioPortal. Basically we had to multiply the number of times a certain variant was seen in the IARC database by the percentage of tumors that have a tp53 mutation, because our mutation rates are calculated across all tumors in specific cancers (confusing, I know).
After graphing the mutation rates against the RFS scores, we created a graph similar to Fig4B in the RFS article. Like the original graph, we obtain two clouds of data points, but the differences in the two cloud sizes reduces. The change is not huge, but it does imply that it is more significant to look at mutation rate rather than prevalence to justify that a mutation is significant within tumor progression.
Now I am looking forward to senior year and my continuing exploration of the sciences. Biostatistics may not be something I want to pursue, but at least I now know how to use R.
P.S turns out the research I wanted to do in the first place (phylogenies) wasn't being conducted over the summer because Dr. Townsend didn't receive a grant to do so :( So I guess I kinda feel guilty for feeling frustrated that I didn't get to do that.
P.P.S Check this cute podcast my PI's daughter was a part of!
http://www.sciencepodcastforkids.com/single-post/2018/08/03/The-Girl-Who-Spoke-Science
To solve the problem of not having enough data points, we used the online TCGA database for raw data that would be used to calculate mutation rates. Mutation rates were calculated through an R coding script that Dr. Cannataro had made. Because the mutation rates were tumor specific, we had to change the proportions that were obtained from the IARC database using data from another database called cBioPortal. Basically we had to multiply the number of times a certain variant was seen in the IARC database by the percentage of tumors that have a tp53 mutation, because our mutation rates are calculated across all tumors in specific cancers (confusing, I know).
After graphing the mutation rates against the RFS scores, we created a graph similar to Fig4B in the RFS article. Like the original graph, we obtain two clouds of data points, but the differences in the two cloud sizes reduces. The change is not huge, but it does imply that it is more significant to look at mutation rate rather than prevalence to justify that a mutation is significant within tumor progression.
Now I am looking forward to senior year and my continuing exploration of the sciences. Biostatistics may not be something I want to pursue, but at least I now know how to use R.
P.S turns out the research I wanted to do in the first place (phylogenies) wasn't being conducted over the summer because Dr. Townsend didn't receive a grant to do so :( So I guess I kinda feel guilty for feeling frustrated that I didn't get to do that.
P.P.S Check this cute podcast my PI's daughter was a part of!
http://www.sciencepodcastforkids.com/single-post/2018/08/03/The-Girl-Who-Spoke-Science
Comments
Post a Comment