NovaSeq results

Run modes

A NovaSeq flow cell has two or four lanes (depending on type). We usually run in Xp mode, where each lane is loaded with a different library, or pool of libraries. You will receive one set of files for each lane, and the lane number is part of the file names. Look for _L00n_ in the file name. If you order a half or quarter flow cell, we will certainly run in Xp mode.

We may run in Standard mode if the same library (or pool) is sequenced on the full flow cell. In that case, the flow cell is treated as a single unit, and the file names will not contain the lane number.

 

FASTQ file name -- Xp mode

FASTQ files use the following naming scheme: 

<SampleName>_S<SampleNumber>_L00<LaneNumber>_R<ReadNumber>_001.fastq.gz

Example: NA10831_S1_L001_R1_001.fastq.gz

  • SampleName: In order to avoid conflict with sample names from other customers, some user submitted sample names might have been slightly modified. However, you should still be able to recognize them.
  • SampleNumber: Order in which sample appears in the run (0 in case there is only one sample in the lane)
  • LaneNumber: Lanes are independent sub-units of the flow cell. The SP, S1, S2 flow cells have two lanes, and the S4 flow cell has four lanes.
  • ReadNumber: R1 for single-read runs and R1/R2 for paired-end runs (first/second read). 

FASTQ file name -- Standard mode

FASTQ files use the following naming scheme: 

<SampleName>_S<SampleNumber>_R<ReadNumber>_001.fastq.gz

Example: NA10831_S1_R1_001.fastq.gz

  • SampleName: In order to avoid conflict with sample names from other customers, some user submitted sample names might have been slightly modified. However, you should still be able to recognize them.
  • SampleNumber: Order in which sample appears in the run (0 in case there is only one sample in flow cell)
  • ReadNumber: R1 for single-read runs and R1/R2 for paired-end runs (first/second read). 

FASTQ format

FASTQ files are delivered in GNU zip format with .gz file extension. The quality score is encoded in the standard way (Sanger fastq). For more information on the FASTQ format refer to wikipedia.

FASTQ format uses four lines per sequence.

  • Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (header).
  • Line 2 is the raw sequence letters.
  • Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again.
  • Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.

FASTQ header (line 1) contains various information separated either by ':' or a space:

An example from a HiSeq FASTQ header:

      @A00943:1:FCX:1:1101:6329:1045 1:N:0:CCGCGTT+NGCGCTAG

  • @ - Each sequence identifier line starts with @.
  • InstrumentID - unique identifier of the sequencer (A00943)
  • RunNumber - Run number on instrument (1).
  • Flowcell_ID - ID of flowcell (FCX).
  • LaneNumber - positive integer, currently 1-4 (1)
  • TileNumber - Tiles are physical sub-units of the lane. positive integer (1101).
  • X - x coordinate of the spot within tile. Integer which can be negative (6329)
  • Y - y coordinate of the spot within tile. Integer which can be negative (1045)
  • ReadNumber - 1 for single reads; 1 or 2 for paired ends (1)
  • whether it is filtered - NB: Y if the read is filtered out, not in the delivered fastq file, N otherwise (N)
  • ControlNumber - 0 when none of the control bits are on, otherwise it is an even number (0)
  • Index: Actual bases sequenced in index read(s). This will look different depending on what indexing mode was used for the run. There may be a mismatch relative to the expected sample index (CCGCGTT+NGCGCTAG).
Published Jan. 22, 2020 12:22 PM - Last modified Jan. 22, 2020 12:47 PM