Nanopore on the command line

class: center, middle, inverse, title-slide

# Nanopore on the command line
## and python3 virtual environments
### Alexis Lucattini
### 2017/08/17

---

# Installing commandline tools (0 to 10 minutes)
Mac OSX by default does not have command line tools installed.  
This is a simple fix as shown [here](http://railsapps.github.io/xcode-command-line-tools.html)

Open up 'Terminal' and type in the following.

```bash
xcode-select -p
```

`/Applications/Apple Dev Tools/Xcode.app/Contents/Developer` shoud appear as the output.
If not, type:

```bash
xcode-select --install
```

---

# Install anaconda3  on desktop (approx. 5 mins)
Open up terminal and type in the following lines of code.  
(Not those starting with '#', these are comments)

```bash
# Set the version number
anaconda_version="4.4.0"
# Download anaconda3 using the wget command
wget https://repo.continuum.io/archive/Anaconda3-${anaconda_version}-MacOSX-x86_64.sh
# Install anaconda. 
# -b forces install without asking questions.
# -p sets anaconda to be installed in our home directory.
bash Anaconda3-${anaconda_version}-MacOSX-x86_64.sh -b -p $HOME/anaconda3
# Now we need to update it.
conda update conda
# And we may need to install the latest version of git
conda install -c anaconda git -y
```

When on a server that uses modules, anaconda may already be installed.
If so, just type in the following:

```bash
module load anaconda3/4.3.1
```

---

# Installing Albacore
.small[
### Linux Users:
Unfortunately, albacore is not supported by python3.6 on Linux.
Therefore we will need to create a python3.5 environment to run our basecalling software on.  
`https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp35-cp35m-manylinux1_x86_64.whl`

### Mac Users:
Albacore is supported on python3.6
Never the less we should create a separate environemnt for albacore to run on anyway.  
`https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp36-cp36m-macosx_10_11_x86_64.whl`

### Windows Users:
Like Linux, Windows Users can only use python3.5.  
`https://mirror.oxfordnanoportal.com/software/analysis/ont-albacore-1.2.6-amd64.msi`
]

???

Worth checking out that Linux distro on Windows.

---

# Creating a conda environment (10 mins)
An environment is a list of settings where software versions and paths are all calibrated for a particular program or list of programs. However, unlike your general workspace, an environment must be 'sourced' and installations of programs into an 'environment' will not disrupt your general workspace.

Here we show an example of creating an environment for albacore

```bash
# Swap out python version as 3.5 if we're on our Linux server
PYTHON_VERSION=3.5
conda create --name albacore_env python=${PYTHON_VERSION} anaconda
```

---

# Installing albacore in the conda environment 
.small[
Now we have our albacore environment, we must 'source' it.

If you can't remember the name of an environment, you can see all your installed environments using:

```bash
conda info --envs
```

Now activate this environment, and install the .whl file using pip

```bash
# Activate environment
source activate albacore_env
# Update the standard conda library (especially important when using conda 3.5)
conda update --all
# Create a standard yaml file
conda env export > standard.yaml
# Download albacore pip wheel for mac
wget https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp36-cp36m-macosx_10_11_x86_64.whl
# Or Linux
wget https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp35-cp35m-manylinux1_x86_64.whl
# Install albacore using pip
pip install ont_albacore-*.whl  # Star represents 
# Write what we have installed to file
conda env export > albacore.yaml
# Decativate the albacore environment
source deactivate
```
]

???

---

# Installing our other tools.
It may be wise to keep albacore as its own environment, and have our other tools in a separate environment.
Albacore is quite dynamic, a with a high-frequency of upgrades.

```bash
# Create a new environment
conda create --name nanopore_tools_env python=3.6 anaconda
# Activate the environment
source activate nanopore_tools_env
# Update the standard conda library
conda update --all
# Create a standard .yaml file (picture of the blank environment)
conda env export > standard_env.3.6.yaml
```

Now let's use conda to install some more analysis tools.

---

# Installing useful nanopore tools.

We use minimap2 from Heng Li to align these long inaccurate reads to our genome.

```bash
# Pauvre, for viewing quality and read-length distributions.
conda install -c bioconda pauvre -y
# bwa-mem and minimap2 alignment
conda install -c bioconda bwa -y
conda install -c bioconda minimap2 -y
# Samtools and bamtools for sorting and assessing alignments
conda install -c bioconda samtools -y
conda install -c bioconda bamtools -y
# Assemblers:
conda install -c bioconda unicycler -y
conda install -c bioconda canu -y
```

---

# Check what we have installed

```bash
conda env export > nanopore.3.6.yaml
diff nanopore.3.6.yaml standard.3.6.yaml | grep '==' > requirements.txt
cat requirements.txt
# Return to normal environment
source deactivate
```

---

# Transferring data across

On the laptop running the MinION we will need to do the following.
Note sample_name is the name specified when using MinKNOW.

```bash
source activate nanopore_env
git clone https://github.com/alexiswl/poreduck.git
# To run the transfer script we will need to type in the following:
./poreduck/transfer_fast5_to_server.py \
--reads_dir </path/to/reads> \
--server_name <your_hpc> \
--user_name <user_on_hpc> \
--dest_directory </path/to/dest/on/hpc> \
--sample_name <name_of_sample> 
```

---

# Running albacore

Albacore has two main commands.  
* `read_fast5_basecaller.py`
* `full_1d2_analysis.py`

We only use the second one when using the SQK-LSK308 kit.

We can check the options by typing in the following into the terminal

```bash
source activate albacore_env
read_fast5_basecaller.py --help
```

---

# Standard albacore command that are required.

```bash
read_fast5_basecaller.py \
--input <path/to/fast5/files> \
--worker_threads <number_of_threads_used> \
--save_path </path/to/albacore/dir/> \
--flowcell <flowcell_version> \
--kit <kit_version>
```

---

# Poreduck: albacore_server_scaled.py

```bash
# Download poreduck
git clone https://github.com/alexiswl/poreduck.git
# Update poreduck
cd poreduck
git pull origin master
```

---

# Poreduck: albacore_server_scaled.py

```bash
# Getting help
albacore_server_scaled.py --help
# Run albacore through poreduck
albacore_server_scaled.py \
--reads_dir </minion/directory> \
--kit SQK-LSK108 \
--flowcell FLO-MIN106 \
--num_threads 5 \
--max_processes 10
```

---

# Introduction to qsub
qsub is the way many users can interact with a HPC, such as Milton.  
qsub allocates partitions of the server to users in a 'fair' manner.  
To run qsub, we pipe the 'albacore' command into a 'qsub command' which tells the terminal to run the albacore job on the qsub cluster.

---
# Pauvre
.pull-left[
* Yield and read-length distribution plots.
* Statistic plots
]
.pull-right[
<img src=images/pauvre_example.png width="100%">
]

```bash
# Create a margin plot
pauvre margin plot --fastq <input_fastq_file>
# Create a summary file.
pauvre stats --fastq <input.fastq_file>
```
---

# Aligning to the genome

```bash
# Create a reference index
minimap2 -x map-ont -d reference_index /path/to/reference_genome
# Use minimap2 to align to the genome.
minimap2 -x map-ont -d reference_index /path/to/fastq > alignment.sam
# The output is a sam file. We should convert this to a bam file and sort it.
samtools view -b alignment.sam -o alignment.bam
# Now sort and index the bam file
samtools sort -o alignment.sorted.bam alignment.bam
samtools index alignment.sorted.bam alignment.sorted.bai
```

---

# Canu: de novo assembly.

Canu is a de novo assembler. i.e completes assembly of a genome without using a reference.  
It requires the user to have an estimate of the genome size prior to use.  
Designed for long-inaccurate reads.  
Corrects, trims and then assembles each genome.

```bash
canu \
-d canu_assembly_directory \
-nanopore-raw \ 
-genomeSize=3g *.fastq
```

Full documentation at:
http://canu.readthedocs.org/en/latest/

---

# Unicycler: for hybrid assembly.

Unicycler is different to canu as it takes in short reads as a method of polishing the genome.

To use Unicycler, you must have short-read illumina data for the same sample.

```bash
unicycler \
-1 short_reads_1.fastq.gz \
-2 short_reads_2.fastq.gz \
-l long_reads.fastq.gz \
-o output_dir
```

Full documentation at:
https://github.com/rrwick/Unicycler