Recent Changes

Wednesday, January 17

  1. page Introduction to basic Unix commands edited ... 1 gene 2 transcript join: join is used to join different files together by a common column.…
    ...
    1 gene
    2 transcript
    join: join is used to join different files together by a common column.
    now let's first make a file called "chromosome_length.txt" and then copy the follow lines into this file. In this file there are two columns: the first column is chromosome name and the second is the length of that chromosome
    [cgrlunix@poset ~]$ nano chromosome_length.txt
    1 18273
    2 22232
    3 12322
    4 9763
    MT 16153
    X 26172
    Y 12736
    Our goal is to append the chromosome length alongside each feature in the bed file "gene.bed". To do this, we need to join both of these tabular files by their common column, the one containing the chromosome names (the first column in both gene.bed and chromosome_length.txt ). To do this, we first need to sort both files by the column to be joined on. This is a vital step — Unix’s join will not work unless both files are sorted by the column to join on. We can appropriately sort both files with sort. After both are sorted, let’s use join to join these files, appending the chromosome lengths to our gene.bed file. The basic syntax is join -1 <file_1_field> -2 <file_2_field> <file_1> <file_2>, where <file_1> and <file_2> are the two files to be joined by a column <file_1_field> in <file_1> and column <file_2_field> in <file_2>. So, with gene.bed and chromosome_length.txt this would be:
    [cgrlunix@poset ~]$ join -1 1 -2 1 gene.bed chromosome_length.txt > gene_length.txt
    [cgrlunix@poset ~]$ less gene_length.txt

    {} (Brace expansion is a mechanism by which arbitrary strings may be generated)
    {A,B,C} expands all the elements within the braces
    (view changes)
    11:07 pm
  2. page Introduction to basic Unix commands edited ... 893 transcript note: the 3rd column contains the features of the gtf ... to count combina…
    ...
    893 transcript
    note: the 3rd column contains the features of the gtf
    ...
    to count combinations,combinations. In this example we would like to count how many of each feature
    [cgrlunix@poset ~]$ tail -n +6 Homo_sapiens.GRCh38.81.gtf | cut -f3,7 | sort | uniq -c
    1303 CDS +
    (view changes)
    10:50 pm
  3. page Introduction to basic Unix commands edited ... gene6 F You can also get some statistics out of uniq with the -c option ... | uniq -dc -…
    ...
    gene6 F
    You can also get some statistics out of uniq with the -c option
    ...
    | uniq -dc-c
    5 gene1 A
    5 gene2 B
    1 gene3 C
    3 gene4 D
    1 gene5 E
    6 gene6 F
    Both sort | uniq and sort | uniq -c are frequently used shell idioms in bioinformatics and worth memorizing. Combined with other Unix tools like grep and cut, sort and uniq can be used to summarize columns of tabular data. Now we want to count the number of record for each feature present in the gtf file:
    (view changes)
    10:37 pm
  4. page Introduction to basic Unix commands edited ... #!genome-build-accession NCBI:GCA_000001405.18 #!genebuild-last-updated 2015-06 ... from t…
    ...
    #!genome-build-accession NCBI:GCA_000001405.18
    #!genebuild-last-updated 2015-06
    ...
    from the gtf and sort these records by chromosomes then by start position,gtf, and then
    ...
    cut -f1,4,5 | sort -k1,1 -k2,2n > gene.bed
    Here, we specify the columns (and their order) we want to sort by as -k arguments. In technical terms, -k specifies the sorting keys and their order. Each -k argument takes a range of columns as start, end, so to sort by a single column we use start, start. In the example above, we first sorted by the first column (chromosome), since the first -k argument was -k1,1. Sorting by the first column alone leads to many ties in rows with the same chromosomes (e.g. “1” and “MT”). Adding a second -k argument with a different column tells sort how to break these ties. In our example, -k2,2n tells sort to sort by the second column (start position), treating this column as numerical data (since there’s an n in -k2,2n).

    \b matches the empty string at the edge of a word. It sets a boundary to the matches.
    It’s also possible to sort in reverse order with the -r :
    [cgrlunix@poset ~]$ tail -n +6 Homo_sapiens.GRCh38.81.gtf | grep "\bexon\b" | cut -f1,4,5 | sort -k1,1 -k2,2nr | less -S

    uniq (when fed a text file, outputs the file with consecutive identical lines collapsed to one)
    Firstcreates a new file uniq_test.txt using nano and copy the following content to this file
    (view changes)
    10:16 pm

Tuesday, January 16

  1. page Introduction to basic Unix commands edited ... 22 3.1 Redirecting the Output ... UNIX commands write to the standard output (that is, th…
    ...
    22
    3.1 Redirecting the Output
    ...
    UNIX commands write to the standard output (that is, they write to the terminal screen), and many take their input from
    ...
    from the keyboard).keyboard) and write to the standard output (that is, they write to the terminal). There is
    ...
    the terminal screenscreen. Sometimes you may wish to save the output in a file rather than just print them in a terminal, and in this case we need to redirect the output.
    echo (writes its arguments to standard output)
    In a terminal window type
    (view changes)
    10:32 pm

Friday, January 12

  1. page Introduction of basic Unix commands edited ... 1.1 Listing files and directories ls (list) ... home directory. Your home directory has …
    ...
    1.1 Listing files and directories
    ls (list)
    ...
    home directory. Your home directory has the same name as your user-name, for example, lebronjames, and it is where your personal files and subdirectories are saved.
    To find out what is in your home directory, type
    [workshop@poset ~]$ ls
    (view changes)
    4:50 pm
  2. page Spring 2018 Workshops edited ... 4. 16s amplicon sequencing data analysis workshop 2/27/2018 5. RNAseq workshop TBD 6. NCBI …
    ...
    4. 16s amplicon sequencing data analysis workshop 2/27/2018
    5. RNAseq workshop TBD
    6. NCBI workshop 4/5/2018
    (view changes)
    4:17 pm

Monday, January 8

  1. page Spring 2018 Workshops edited ... 2. Ensembl workshop 1/23/2018 3. Kbase workshop TBD ... analysis workshop TBD 2/27/2018 …
    ...
    2. Ensembl workshop 1/23/2018
    3. Kbase workshop TBD
    ...
    analysis workshop TBD2/27/2018
    5. RNAseq workshop TBD
    (view changes)
    12:03 pm

More