Chaw tsom xam siv lub noob teem Day Analysis (GSEA) Tuam

Gene Set Enrichment Analysis is one of many approaches to the analysis of gene expression profile data and is described in a paper from workers at the Broad Institute.

The basic concept was prompted by the observation that studying individual genes showing the most significant difference in expression level between two states or phenotypes is lacking in mechanistic insight. Instead, it makes more sense to take a set of genes sharing some biological link, and ask the question – does the whole set show any statistically significant enrichment in those genes that have differential expression?

A gene set can be chosen, a priori, for a number of reasons e.g. the set of genes known to be influenced by over- or under-expression of a micro-RNA, or perhaps a set chosen based on chromosomal location, or genes for which molecular function, cellular component and / or biological process have been assigned using the controlled vocabularies of the Gene Ontology.

One advantage to the GSEA approach is that it is possible to incorporate your complete data set, not just those transcripts with an arbitrarily chosen differential expression threshold. I am sure that many people reading this will be thinking – “How can it be OK to use the complete dataset? Normally I would only consider genes with >2 (OR other favourite value)-fold differential expression.” The reason the approach is valid is that genes expressed at low levels or with large variance between replicates do not contribute to the main metric used by GSEA, the ‘enrichment score (ES).

GSEA works by first ranking the expression value for each gene by signal to noise ratio – calculating the difference between the mean values for samples representing each phenotype and scaling them by the sum of the standard deviations. This means that genes with large differences in expression level between different states and little variation between biological replicates are ranked highly.

The next step is that the ES, the primary statistic generated by GSEA, is calculated for each gene set – in the GSEA manual, which documents the software excellently, it states:

“All genes are first ranked by their signal to noise ratio, then the ES is calculated by “walking” down the ranked list of genes increasing a running-sum statistic when a gene is in the gene set and decreasing it when it is not. The magnitude of the increment depends on the correlation of the gene with a phenotype. The ES is the maximum deviation from zero encountered in walking the list. A positive ES indicates gene set enrichment at the top of the ranked list; a negative ES indicates gene set enrichment at the bottom of the ranked list.”

The ES values are normalised based on gene set size and then a false discovery rate is calculated, to give an estimated probability of false positives. GSEA uses a very relaxed default value of 25%, which is suitable for hypothesis generation with a relatively large number of biological replicates.

Scientists working on data from non-human samples can still use GSEA, but need to beware – the gene symbols used by GSEA are “translated” from their human equivalents i.e. identifiers used for genes from your species of interest represented on the microarray are converted into symbols for their human orthologues, then used in the analysis. Subramanian and colleagues claim that this conversion has little or no effect on the utility of GSEA; it has been used successfully in multiple non-human species, but of course this must be kept in mind when investigating results in detail.

For an excellent, in-depth, review of pathway tools, consult:

Khatri, P., Sirota, M., & Butte, A. J. (2012). Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges. PLoS Computational Biology, 8(2), e1002375. doi:10.1371/journal.pcbi.1002375

Another good source of advice on pathway analysis, especially for those familiar with the R statistics package is ntawm no.

Further reading

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102:15545-15550

Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M (2005) Systematic discovery of regulatory motifs in human promoters and 3[prime] UTRs by comparison of several mammals. Nature 434:338-345

Muab tso rau hauv Chaw tsom | 1 Teb

Yeej ntawm editing cov phau ntawv txoj kev kawm ntawv kev kawm

Image courtesy of ningmilo / FreeDigitalPhotos.net

Los sis:"beginner txoj kev qhia herding miv".

Xav txog qhov scenario: koj yog ib tus paub txog txoj kev kawm ntawv, nyob rau hauv ib qhov kev tshawb fawb tibneeg hu tauj coob lub koom haum thiab jib yog caw los hloov phau ntawv, tab sis cia ntawm siab uas ua hauj lwm; ces pheej yuav qhia tau tias nws yuav zoo rau koj CV. Koj pom zoo, nws yuav zoo rau koj CV, yog li koj kuas nyuas siv zug kheej editing koj sau tej ntaub ntawv kawm science thawj phau.

Li ntawd, vim li ntawd yog ib qho teeb meem?

Tau sau phau ntawv txog board

Koj xav tau tus neeg sau rau tshooj uas hais zoo. Koj Google tej cov kws txawj big-name thiab caw lawv pab txhawb ib tshooj rau koj phau ntawv. Lawv tsis kam yuav luag txhua, los yog tsis teb koj tus email nrog lus. Tab sis npog, dog dig rau koj amazement, pom ib tug. Txawm li cas los, no paragon ntawm kev kawm ces yeej tsis, puas replies los twg yav tom ntej. Li ntawd, koj txo koj lub sights thiab tsom rau zaum zoo, tab sis tsis Nobel nqi zog winners. Thaum kawg, koj yuav tau txaus sau phau ntawv ua ke los sau ntawv rau tshooj uas hais ib ncig lub npe tias tus publishers tau muab koj – phew!

Tau sau phau ntawv pom zoo lub sijhawm

Piv txwv tias yog nws no tsis yog tsis yog zoo caij, sawv daws yuav so feem ntau yog hais txog lub sijhawm teeb. Txawm li cas los, kev xwb yuab tiag tiag yog:

Kom lawv mus raws li sijhawm.

  1. Qhov no yuav tsum yooj yim, txoj cai? Zaum no grown-up, kev cob qha cov neeg. Lawv tsis? Qhov tshij, tsawm tsawv thiab. Nyob hauv kev muaj tiag, academics feem ntau tshaj kuas nyuas siv zug kom lawv, ua neej xwb tsis tshawb xyuas thiab qhia, tab sis tam sim no thiab sau qhov nyiaj pab kev siv nyiaj, ntaub ntawv, xyuas, phau ntawv chapters, etc, yam. thaum kawg no, nqe lus scientific lub hom phiaj yog "luam tawm tsab los yog tau damned."
  2. Raws li tus deadlines mus past – "wooshh", ib yam li kis tsheb, ib nrab koj cais tau xa lawv chapters, tas tsis. Tam sim no tuaj ib pliag nplaum – tej no yog meant yuav tej ntug kev xyuas. Xeev-rau-tus-daim duab. Tab sis cov no qeeb ziag no txhais tau hais tias yog tus sau phau ntawv 'zoo' ua haujlwm loj yog xeeb nws muag hnub. Koj yuav tsum tau mus nkag rov qab rau lawv kom tshiab. Uas lawv muaj feem ntau tsis dhau heev tsis zoo siab txog, tab sis, koj ntxub tsis tau ntawm lub ntsej muag.
  3. Ib tug ntau tshaj plaws uas kuv tsis nco qab los pav; raws li cov editor, koj yuav tsum nyeem cov chapters. Tuaj tseem, koj yuav tsum ua cogent critiques – seb tus sau xav ntxiv, tshem tawm, nthuav los sis ua ntawv cog lus. Txawm yog hais tias lub npe rau lub fringe ntawm koj lub ntsiab kev tshaj lij.

Ua li cas yog hais tias tus sau phau ntawv mus AWOL?

Koj yuav ua dab tsi thaum koj cais ib tug txiav txim tias lawv tsis mus sau lawv lub Hmong Translation Tshooj? Txhob cias sej, tsis raws cai deadlines, tab sis tsis txhob muaj kev sib txuas lus rau tag nrho. Ploj tawm ntawm daim ntawv qhia. So, tam sim no koj nyob nraum nyam – nrhiav lwm tus sau(s) – ncua ntxiv – sau ntawv hauv tshooj no koj tus kheej? – tiam sis nws tseem deb sab nraumLi ntawdoj cheeb tsam ntawm koj tus kheej kev txawj ntse. So, nws thiaj li, koj pom lwm tus. Uas txhais tau hais tias tsis tau qeeb tshaj.

Sau koj tus kheej Tshooj

Auj, yog, koj tsis nco qab tias koj pom zoo sau ib qho rau tshooj uas hais koj tus kheej. Oops. Huag zoo, tsis muaj teeb meem. Muab tus nqi authorship ib qhov ntawm koj kawm PhD – lawv mam li yuav ntog lawm kom lawv tau txais lwm cov ntawv qhia rau lawv CV. Los sis tej zaum tsis: tsis muaj, lawv yuav tsis xav tiam; obviously suspecting (thwj) tias koj taw yog kom lawv sau tag nrho cov khoom, then submit the chapter to you for a little light editorial polishing.

Pleading nrog lub publishers rau cov sij hawm ntau

  1. Koj tam sim no tuav cov ntaub ntawv dubious rau lub coos hnub sij hawm sau tej kev kawm ntawv uas nyob rau hauv tib neeg cov keeb kwm, excluding daim ntawv Vajtswv.
  2. ' Thov, sir, Kuv xav tau ib txhia ntxiv.'
  3. Tus publishers yuav tsis impressed, tab sis, ntsiag to resigned, qhia rau koj kom ploj mus thiab los rov qab thaum koj ua tau raws li lub sijhawm uas koj tshiab.

Poob koj moj bis thiab muab txhua nrho

Nws yog txhua yam noj ntev heev – cais dhau ob peb tau xa ntawv los thawj uas lawv chapters. You start to get desperate – the original deadline was so long ago that you’ve forgotten it – the “new” deadline is also now history. You consider giving the whole thing up – apologise to the authors and the publishers and say the book can’t be finished. But your co-editor and the authors who have delivered on time are indignant – naturally enough they don’t want to see their work wastedand insist that you go back to the recalcitrant scientists with a big stick. How do you threaten authors with a stick by email? Or by phone? Txawm li cas los, a combination of the metaphorical big stick, pleas for mercy and piling on the guilt eventually work and all the chapters are delivered! Hooray.

Hooray!

Li ntawd, now, you’re on the last lap. Or the last dregs – the soul-destroying process of assembling the index and proofreading. Once, a sub-editor with a scientific background might have written an index, but not now. Academic publishers want their pound of flesh, so this task is delegated to authors and editors. Authors select keywords from their chapters, with varying degrees of enthusiasm or accuracy, then the editor attempts to assemble them into something useful to the reader. Thaum kawg, a draft proof arrives by email. You are now heartily sick of every word, but a final spurt of enthusiasm drives you on and the book is finished.

One more thing – did I forget? – you don’t get paid – but you are given a few free copies of your own book. Such fun!

 

Muab tso rau hauv Lub teeb nyem | 1 Teb

Ib cov hloov hauv lub transcription tau ua kua nplaum pob ntseg li cas?

Mob ntsej rag, sometimes known as “glue ear, is the most common bacterial infection hauv children and by 1 lub xyoo uas muaj hnub nyoog txog 60% cov me nyuam yuav tau txhaj ib. Tej zaum, children develop a chronic condition, uas, dua li ntawm cov raug tus mob no yog kho, tus “kua nplaum” doesn’t go away and causes deafness.  In an inherited mouse model of chronic glue ear the causative mutation has been shown to be in a gene encoding a transcription factor, Evi1.

The EVI1 protein has multiple domains, can repress or enhance expression of target genes and interact with many other proteins. Indeed, the multiplicity of known and potential interactions is a challenge to determining the role of the mutation.  There were clues, Txawm li cas los, as to how this mutation might lead to disease from differences in phenotype e.g. mutant mice raised in a “clean” SPF animal facility were less likely to become deaf than those kept in the older, “dirty” animal house.

Did this mean that gene-environment interactions e.g. between immune system and microbes, influence disease susceptibility? It was also known that mutant mice showed high levels of influx of neutrophils into their middle ear cavities (inflammation), but it was unclear whether EVI1 was acting directly or indirectly in this process. Possible answers to these questions came recently from studies in cultured cells, showing that EVI1 can act as an inhibitor of one of the key proteins regulating inflammation, another transcription factor, nuclear factor kappa B (NFkB).  EVI1 binds to to one of the subunits of NFkB and interferes with a critical protein modification, acetylation.  However, EVI1 does not acetylate proteins directly, so other factors must be involved. What were those other factors?

I combined public and unpublished data using literature searches thiab open source software e.g. iRefWeb in order to identify steps in the NFkB signalling pathways that might be disturbed by the mutation in EVI1.  The novel target proteins and starting points for drug development I discovered are suitable for testing in this preclinical model of chronic otitis media.

Read our testimonial from Dr Michael Cheeseman.

 

Muab tso rau hauv Chaw tsom, Phiaj foundations | Cia ib saib

Phiaj foundations rau thaum yau chiv mob hawb pob

Asthma is caused by a combination of environmental thiab genetic influences, tab sis, rau tej yam yuav tsis to taub. Ib qhov "ntaus" kuaj tau nyob rau hauv ib cov koom haum genome-dav scan (GWAS) for childhood asthma led a client to believe that one gene might be partially responsible. Tseeb tiag tias no cov koom haum caj yeej yog ua rau mob hawb pob heev, Txawm li cas los, nyuaj. Firstly, no one knew the function of the protein made by the gene and secondly, changing genes in humans to test a hypothesis, rather than as therapy, is technically challenging & ethically questionable, especially in children. Fortunately, mice share about 90% of their genes with humans, so scientists “knocked-out” the equivalent gene, then tested whether these animals behaved like children with asthma. The short answer is – they didn’t. In lung-function tests that would have had asthmatics reaching for their inhalers, tus knock-out mice were completely dab tsi. Li ntawd, what was going on? Were mice not enough like humans? Was this the wrong gene?

For this project, I went back to first principles – what was the evidence supporting the idea that this gene was responsible for increased asthma risk? Digging through the online literature, in particular papers from other groups studying the same gene and supplementary material not available in print, there were suggestions that the genetic effects were more complex. I found evidence that two other genes nearby were either more or less transcriptionally active in asthmatics and so might play a role in susceptibility to asthma. Tas, using data from the ENCODE project, I found that the regulatory element predicted to control these genes was conserved in mice, so it would be possible to test the predictions experimentally.

This suggested a novel therapeutic target – altering the activity of a cluster of genes, rather than just one, might alter disease risk.

Testimonial

Muab tso rau hauv Phiaj foundations | Cia ib saib

Chaw tsom xam cov ntaub ntawv qhia noob – txiv neej txo fertility / sterility

A group of animals that can breed and produce fertile offspring is one of the definitions of a species.

This means that the biological mechanisms of fertility and infertility are of interest not only to evolutionary biologists, but also to clinicians and of course to the wider public. At the Institute of Molecular Genetics in Prague, Prof. Jiri Forejt is studying what controls fertility in the hybrid offspring produced by the mating of mouse sub-species. He wanted to know why some male mice were infertile – he knew that genes in one particular genome region were important, but not how those genes influenced the expression of the rest of the genome.

This is where I was recruited into the team, to help with identifying the classes of genes disrupted in mice with reduced fertility. Scientists in his group had produced Affymetrix gene expression results from the testes of fertile, sub-fertile and infertile mice and I analysed these data genome-wide for differentially-expressed transcripts. Using the Broad Institute’s marvellous GSEA tool, I assessed the statistical evidence that specific Gene Ontology terms and pathways were over-represented and also whether differential genes were localised to particular genome regions. This analysis uncovered evidence that specific, functionally related sets of genes were over-represented in the expression data and helped to develop novel hypotheses about the causes of reduced fertility.

Muab tso rau hauv Chaw tsom, Phiaj foundations | 1 Teb

Phiaj foundations nyob rau hauv cov noob nqaij tsis muaj zog

Leeg tsis muaj zog yuav ua los ntawm ib tug tsis tshua muaj noob kab mob hu ua myofibrillar myopathy. Gonzalo Blanco pab (team) pom tus qauv nas uas muaj tus kab mob no thiab xav paub ua lwm tus cov leeg heev tsis muaj zog. Lawv tsom tau los nrhiav tej tshuaj rua lub hom phaj los txhais lus rau kev tshawb fawb soj ntsuam muaj menyuam mus saib thiab soj ntsuam.

Ua ntej kuv ua koom tes, tus kab mob no muaj lawm mapped mus ib thaj av loj uas muaj ib tug chromosome thiab Dr Blanco lub pab (team) tau npaj tau siv pa positional cloning txoj kev yuav nrhiav txoj kev hloov. Kuv npaj siab tias ib tug mus kom ze sai yuav siv tau cov cim ntxiv mus sequencing targeted ntawm noob rau hauv thaj av ntawd. Kuv tsim cov txheej probes yuav enrich DNA tawg tsam hais thiab kuv ua hauj lwm nrog rau ib bioinformatician, Dr. Michelle Simon, yuav tsim ib tug pipeline software nrhiav thiab characterise change.

Qhov kawg ntawm qhov kev tsim, the pipeline was used to identify mutations in the muscle weakness mutants and predict that they altered the coding sequences of two genes; Myh4 thiab Pmp22. Ob txog hlua ntawm cov tim khawv pom tias tias txoj kev hloov hauv Myh4, uas lis dej num kom muaj protein ntau tej nqaij myosin, yog yuav muaj feem ua kom muaj tus tsis muaj zog. Firstly, lug txhawb peb cov miv nyob nas teev tsuas qhov myosin hloov tseem muaj txoj kev zoo losis phem thiab secondly, txawv txav protein ntau aggregates ntawm cov nas muaj nyob lawm cov myosin loj loj.

Zaum ntawm lub MRC Mammalian noob caj noob ces tsev tau siv cov qub mus kom ze, that Michelle Simon and I pioneered, to find mutations in other disease models.

Publication in Human Molecular Genetics

Testimonial from Dr. Gonzalo Blanco

Muab tso rau hauv Phiaj foundations | Cia ib saib
  • Kev pab peb

    Link to ourLinkedin
    Link to ourRss
    Link to ourTwitter