FASTQ file name
FASTQ files use the following naming scheme:
- SampleName: In order to avoid conflict with sample names from other customers, some user submitted sample names might have been slightly modified. However, you should still be able to recognize them.
- SampleNumber: Order in which sample appears in the run (0 in case there is only one sample in the lane).
- ReadNumber: R1 for single-read runs and R1/R2 for paired-end runs (first/second read).
FASTQ files are delivered in GNU zip format with .gz file extension. The quality score is encoded in the standard way (Sanger fastq). For more information on the FASTQ format refer to wikipedia.
FASTQ format uses four lines per sequence.
- Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (header).
- Line 2 is the raw sequence letters.
- Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again.
- Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.
FASTQ header (line 1) contains various information separated either by ':' or a space:
An example from a HiSeq FASTQ header:
- @ - Each sequence identifier line starts with @.
- InstrumentID - unique identifier of the sequencer (M00329)
- RunNumber - Run number on instrument (2).
- Flowcell_ID - ID of flowcell (000000000-A0HGR).
- LaneNumber - positive integer, currently 1-8 (1)
- TileNumber - positive integer (1)
- X - x coordinate of the spot. Integer which can be negative (16318)
- Y - y coordinate of the spot. Integer which can be negative (1464)
- ReadNumber - 1 for single reads; 1 or 2 for paired ends (1)
- whether it is filtered - NB: Y if the read is filtered out, not in the delivered fastq file, N otherwise (N)
- Controlnumber - 0 when none of the control bits are on, otherwise it is an even number (0)
- SampleNumber: Order in which sample appears in the run.