Gaining Access to Sequencing Data on the HPC

To access your sequencing data on the NYU HPC cluster, you will need to:

  1. Create an HPC account by visiting the NYU High Performance Computing Wiki and following the account creation procedures.
  2. Submit a request using the Biology Computation Support Form to join the CGSB Linux working group and gain permission to your lab's sequencing results directory.

Data Policy and Retention

  • Demultiplexed and raw lane fastq files are transferred to lab directories at /projects/rps/cgsb on the HPC.
  • Lab owners receive read access to fastq files in this location.
  • Data in lab directories is backed up and is not subject to deletion.
  • Raw and processed sequencing directories are archived and retained for a minimum of five years.
  • Raw sequencing directories are available upon request.
  • Lab shares are kept up to 3 years after PI departure from CGSB.

HPC Best Practices

  • Run jobs and save output in your personal scratch directory: /scratch/netID/my-project/job-xyz/
  • Store Slurm scripts in job or project directories to enable parameter verification and reproducibility.
  • Keep personal scripts (Python, executables) in your home folder: /home/netID/
  • Reference scripts from your home directory using the $HOME variable in Slurm submissions.
  • Request unavailable software packages from hpc@nyu.edu; check existing modules via module avail.

HPC Important Locations

Directory Path Quota Flushing Backup Purpose
GenCore Fastq Delivery /projects/rps/cgsb/gencore/out/ Protected Yes Sequencing output files
Lab Share /projects/cgsb/ Subject to charges Protected Cloud Lab collaboration and results sharing
Personal Scratch /scratch/netID/ 5TB 60-day deletion No Active analysis work
Personal Home /home/netID/ 50GB Protected Yes Custom scripts and tools
Personal Archive /archive/netID/ 2TB Protected Yes Completed archived analyses
Shared Genomes /projects/work/cgsb/genomes Protected Yes Common genomic datasets

To establish a shared lab directory, please submit a Lab Share Directory Form.

For shared genome resources, see the Shared Genome Resource documentation.