CIMR » JDRF/WT DIL  » Vincent Plagnol's Homepage  » Software

The Wellcome Trust  

RNA_PEref version 1.0

Set of perl tools to create a reference sequence set for paired end mapping of short transcriptome sequencing reads.

novoPile version 1.27 (and a early version of novoPhase)

Novopile is a c++ tool for direct processing of the novoalign output, creating a pileup file designed to match the MAQ pileup output. Novopile can directly process gzipped file, useful given the size of the novoalign output files. In addition to the novoalign output, the user must also provide a query file, indicating the SNPs one wants the output for. The format is simply:
seq1 pos1 ref1
seq2 pos2 ref2
...
seqn posn refn

The third column is the reference allele from the target reference genome. There can be more columns, and these should be ignored by the software. Otherwise just running ./novoPile should provide all the various options.
In particular, The user can also specify an output file to get all the small indels. Another option is to specify an output file to obtain summary information about the novoalign output (mapping qualities, base composition...).
Novophase is a tool used to phase sequencing data using short reads. Options are similar to novoPile and it looks for paired read that overlap two heterozygous SNPs. This information allows the phasing of these two SNPs. This tool is in its early stages but usable, so please get in touch with me if you are interested in its applications.

QTLMatch version 0.8

QTLMatch is a small R package linked to a paper currently under review in Biostatistics. Its purpose is to test whether two associations (typically an eQTL and a disease association, but maybe two eQTLs or two disease associations) that colocalize in the same genomic region are consistent with a unique causal variant. The current version should be usable and the documentation complete but I suspect that a few more iterations will be needed to reach a really stable 1.0 version.

SNP typing: JAPL

The following paper: A method to address differential bias in genome wide association studies presents a new method for SNP scoring. This method is designed to address case-control situations where several cohorts need to be typed together but one must also allow for different DNA quality, or any differences of that sort across samples. This method has worked successfully on the Wellcome Trust Case Control Consortium data as well as the Diabetes and Inflammation Laboratory MIP nSNP scan. The code is available here. The settings are meant for the Affymetrix 500K but the program is very flexible and can easily be adapted. There are plans to extend the work for the Illumina platform.

Requirements:

The code has been written for a unix/linux machine. It uses c++ and needs a few commonly used libraries: gsl, std and the boost libraries.

Usage:

To make it easier to use it has been designed to use the same options as Chiamo++, the software that was used to call the WTCCC data. Essentially replacing chiamo by JAPL should call this alternative algorithm instead (with a few details).

JAPL outputs a quality score. A resonable threshold for Affy or Illumina data is 2. This score is found either in the summary file or in the individual file generated for each SNP. The output format fits a snpMatrix input format, a R software designed for the association testing step.

Flag Description
-i file_1 ... file_n Specifies n input files containing the normalized intensity data for the n cohorts.
This gzipped format is the same that CHIAMO++ uses (here is an example file ).
The normalized intensity files are created from the raw CEL file data using a program written by Hin-Tak Leung (hin-tak.leung@cimr.cam.ac.uk).
These input files must in gzipped, a default format generated by Hin-Tak's program.
Please contact him if you need a copy of this program.
-snps Takes as an argument one file that contains the list of SNPs one wants to type.
If you do not specify this option it also works but it will type all the SNPs in the file.
Hin-Tak's program uses both the rs identifier as well as a SNP identifier (typically the Affymetrix ID).
Any of the two identifiers can be used (mixing is possible).

You can alternatively specify two numbers (start and end). The program will then score all SNPs in the list between start and (end-1).
-o Takes as an argument one folder where the result of the typing will be placed.
JAPL will generate one file per SNP, in a format convenient for snpMatrix (which you can use to do the association testing).
-summary Takes as an argument one file where the summary of the scoring will be written (MAF, quality score, identifier).
-inputFormat Optional argument. The default is "hintak" for the default format generated by Hin-Tak's Leung normalization programs. It can also accept "illuminus" if you are working with that input format.
-perturbations Optional and experimental. Meant to assess the robustness of the calls by adding a perturbation to the input intensities.

Ancestral structure in human populations

The code provided here is meant to complement the Plagnol and Wall 2006 paper. I provide the code I used to analyze the data. It should compile on any linux machine. This code also contains many routines that I accumulated during my PhD that are not directly relevant to the paper. It should be easy to install but might not be very easy to use: this is being updated now. Large chunks of the code have been imported from R. Hudson's work whom I thank very much.

However, this is rather old code and not as clean as it should be. I am currently revisiting this code and hope to clean it in the process. The result is a rather experimental R package called Rcoal . Here is the vignette that illustrates its functions.

 

Privacy