How to separate into different variables a file with multiple FASTA

3

I wanted to know how I can put each FASTA into a different variable, all of which are in the same file. Or even put in an array and retrieve each of them by numbers.

Each Fasta starts with the symbol of > , as in the example:

'>'Pvivax_1
AAGGTTT

'>'Pvivax_2
TTGGCCC
    
asked by anonymous 16.07.2014 / 19:03

3 answers

2

As in Fasta files there are suspicious headers and contents that would be interesting for you to keep them apart and retrieve whatever you want from each of them:

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'arquivo.fasta';
open my $info, $file or die "Nao foi possivel abrir o arquivo $file: $!";
@cabecalho = ();
@conteudo = ();
while( my $linha = <$info>)  { 
    if($linha =~ '>'){
       push(@cabecalho, $linha);
    }else{
       push(@conteudo, $linha);
    }   
    last if $. == 2;
}

close $info;

Then to recover:

print $cabecalho[0];  
    
16.07.2014 / 19:33
2

This method is OK if your file is not too large:

sub ler_fasta {
    my %seqs;
    my $header;
    my $seq;
    open (IN, $arq) or die "abrir o arquivo falhou $arq: $!\n";
    while (<IN>) {
        if (/>/) {
            if ($seq) {
                $seqs{$header} = $seq;
            }

            $header =~ s/^>//; # remove o ">"
            $header =~ s/\s+$//; # remove espacos / tabs no final

            $seq = ""; # apaga a sequencia antiga
        }  else {
            s/\s+//g; # tira os espacos etc.
            $seq .= $_; # adiciona a nova sequencia
        }
    }
    close IN;

    if ($seq) { # a ultima sequencia
        $seqs{$header} = $seq;
    }

    return \%seqs; # retorna o array das sequencias

Reference .

    
16.07.2014 / 19:26
1

(Well I know this question is old, but it's so rare things other than html and js that I can not resist ...)

#!/usr/bin/perl
use strict;

sub ler_fasta { my $file=shift;
  local $/="'>'";        # separador de registo=  '>'
  my %val;

  open(FASTA, "fasta.txt") or die "Nao foi possivel abrir o arquivo: $!";
  while( <FASTA>)  { chomp;
      if(/(.+)\n(.+)/){ $val{$1}=$2 }
  }
  return \%val
}

In this way the values are associated with the identifier (Ex: print $val->{Pvivax_1} )

use Data::Dumper;    print Dumper( ler_fasta("fasta.txt"))

gives

$VAR1 = { 'Pvivax_2' => 'TTGGCCC',
          'Pvivax_1' => 'AAGGTTT'
        };
    
10.03.2015 / 12:30