class: center, middle, inverse, title-slide # Nanopore on the command line ## and python3 virtual environments ### Alexis Lucattini ### 2017/08/17 --- # Installing commandline tools (0 to 10 minutes) Mac OSX by default does not have command line tools installed. This is a simple fix as shown [here](http://railsapps.github.io/xcode-command-line-tools.html) Open up 'Terminal' and type in the following. ```bash xcode-select -p ``` `/Applications/Apple Dev Tools/Xcode.app/Contents/Developer` shoud appear as the output. If not, type: ```bash xcode-select --install ``` --- # Install anaconda3 on desktop (approx. 5 mins) Open up terminal and type in the following lines of code. (Not those starting with '#', these are comments) ```bash # Set the version number anaconda_version="4.4.0" # Download anaconda3 using the wget command wget https://repo.continuum.io/archive/Anaconda3-${anaconda_version}-MacOSX-x86_64.sh # Install anaconda. # -b forces install without asking questions. # -p sets anaconda to be installed in our home directory. bash Anaconda3-${anaconda_version}-MacOSX-x86_64.sh -b -p $HOME/anaconda3 # Now we need to update it. conda update conda # And we may need to install the latest version of git conda install -c anaconda git -y ``` When on a server that uses modules, anaconda may already be installed. If so, just type in the following: ```bash module load anaconda3/4.3.1 ``` --- # Installing Albacore .small[ ### Linux Users: Unfortunately, albacore is not supported by python3.6 on Linux. Therefore we will need to create a python3.5 environment to run our basecalling software on. `https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp35-cp35m-manylinux1_x86_64.whl` ### Mac Users: Albacore is supported on python3.6 Never the less we should create a separate environemnt for albacore to run on anyway. `https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp36-cp36m-macosx_10_11_x86_64.whl` ### Windows Users: Like Linux, Windows Users can only use python3.5. `https://mirror.oxfordnanoportal.com/software/analysis/ont-albacore-1.2.6-amd64.msi` ] ??? Worth checking out that Linux distro on Windows. --- # Creating a conda environment (10 mins) An environment is a list of settings where software versions and paths are all calibrated for a particular program or list of programs. However, unlike your general workspace, an environment must be 'sourced' and installations of programs into an 'environment' will not disrupt your general workspace. Here we show an example of creating an environment for albacore ```bash # Swap out python version as 3.5 if we're on our Linux server PYTHON_VERSION=3.5 conda create --name albacore_env python=${PYTHON_VERSION} anaconda ``` --- # Installing albacore in the conda environment .small[ Now we have our albacore environment, we must 'source' it. If you can't remember the name of an environment, you can see all your installed environments using: ```bash conda info --envs ``` Now activate this environment, and install the .whl file using pip ```bash # Activate environment source activate albacore_env # Update the standard conda library (especially important when using conda 3.5) conda update --all # Create a standard yaml file conda env export > standard.yaml # Download albacore pip wheel for mac wget https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp36-cp36m-macosx_10_11_x86_64.whl # Or Linux wget https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp35-cp35m-manylinux1_x86_64.whl # Install albacore using pip pip install ont_albacore-*.whl # Star represents # Write what we have installed to file conda env export > albacore.yaml # Decativate the albacore environment source deactivate ``` ] ??? --- # Installing our other tools. It may be wise to keep albacore as its own environment, and have our other tools in a separate environment. Albacore is quite dynamic, a with a high-frequency of upgrades. ```bash # Create a new environment conda create --name nanopore_tools_env python=3.6 anaconda # Activate the environment source activate nanopore_tools_env # Update the standard conda library conda update --all # Create a standard .yaml file (picture of the blank environment) conda env export > standard_env.3.6.yaml ``` Now let's use conda to install some more analysis tools. --- # Installing useful nanopore tools. We use minimap2 from Heng Li to align these long inaccurate reads to our genome. ```bash # Pauvre, for viewing quality and read-length distributions. conda install -c bioconda pauvre -y # bwa-mem and minimap2 alignment conda install -c bioconda bwa -y conda install -c bioconda minimap2 -y # Samtools and bamtools for sorting and assessing alignments conda install -c bioconda samtools -y conda install -c bioconda bamtools -y # Assemblers: conda install -c bioconda unicycler -y conda install -c bioconda canu -y ``` --- # Check what we have installed ```bash conda env export > nanopore.3.6.yaml diff nanopore.3.6.yaml standard.3.6.yaml | grep '==' > requirements.txt cat requirements.txt # Return to normal environment source deactivate ``` --- # Transferring data across On the laptop running the MinION we will need to do the following. Note sample_name is the name specified when using MinKNOW. ```bash source activate nanopore_env git clone https://github.com/alexiswl/poreduck.git # To run the transfer script we will need to type in the following: ./poreduck/transfer_fast5_to_server.py \ --reads_dir </path/to/reads> \ --server_name <your_hpc> \ --user_name <user_on_hpc> \ --dest_directory </path/to/dest/on/hpc> \ --sample_name <name_of_sample> ``` --- # Running albacore Albacore has two main commands. * `read_fast5_basecaller.py` * `full_1d2_analysis.py` We only use the second one when using the SQK-LSK308 kit. We can check the options by typing in the following into the terminal ```bash source activate albacore_env read_fast5_basecaller.py --help ``` --- # Standard albacore command that are required. ```bash read_fast5_basecaller.py \ --input <path/to/fast5/files> \ --worker_threads <number_of_threads_used> \ --save_path </path/to/albacore/dir/> \ --flowcell <flowcell_version> \ --kit <kit_version> ``` --- # Poreduck: albacore_server_scaled.py ```bash # Download poreduck git clone https://github.com/alexiswl/poreduck.git # Update poreduck cd poreduck git pull origin master ``` --- # Poreduck: albacore_server_scaled.py ```bash # Getting help albacore_server_scaled.py --help # Run albacore through poreduck albacore_server_scaled.py \ --reads_dir </minion/directory> \ --kit SQK-LSK108 \ --flowcell FLO-MIN106 \ --num_threads 5 \ --max_processes 10 ``` --- # Introduction to qsub qsub is the way many users can interact with a HPC, such as Milton. qsub allocates partitions of the server to users in a 'fair' manner. To run qsub, we pipe the 'albacore' command into a 'qsub command' which tells the terminal to run the albacore job on the qsub cluster. --- # Pauvre .pull-left[ * Yield and read-length distribution plots. * Statistic plots ] .pull-right[ <img src=images/pauvre_example.png width="100%"> ] ```bash # Create a margin plot pauvre margin plot --fastq <input_fastq_file> # Create a summary file. pauvre stats --fastq <input.fastq_file> ``` --- # Aligning to the genome ```bash # Create a reference index minimap2 -x map-ont -d reference_index /path/to/reference_genome # Use minimap2 to align to the genome. minimap2 -x map-ont -d reference_index /path/to/fastq > alignment.sam # The output is a sam file. We should convert this to a bam file and sort it. samtools view -b alignment.sam -o alignment.bam # Now sort and index the bam file samtools sort -o alignment.sorted.bam alignment.bam samtools index alignment.sorted.bam alignment.sorted.bai ``` --- # Canu: de novo assembly. Canu is a de novo assembler. i.e completes assembly of a genome without using a reference. It requires the user to have an estimate of the genome size prior to use. Designed for long-inaccurate reads. Corrects, trims and then assembles each genome. ```bash canu \ -d canu_assembly_directory \ -nanopore-raw \ -genomeSize=3g *.fastq ``` Full documentation at: http://canu.readthedocs.org/en/latest/ --- # Unicycler: for hybrid assembly. Unicycler is different to canu as it takes in short reads as a method of polishing the genome. To use Unicycler, you must have short-read illumina data for the same sample. ```bash unicycler \ -1 short_reads_1.fastq.gz \ -2 short_reads_2.fastq.gz \ -l long_reads.fastq.gz \ -o output_dir ``` Full documentation at: https://github.com/rrwick/Unicycler