Read fasta files in python and skip the first line

-1

I need to read a fasta file, but I do not know how to delete the first line of the string , example:

>sequence A

ggtaagtcctctagtacaaacacccccaatattgtgatataattaaaattatattcatat tctgttgccagaaaaaacacttttaggctatattagagccatcttctttgaagcgttgtc

By doing some tests I realized that if you add letters in the first line >sequence aaaA is being included in the count.

How do I drop the first line of my letter count?

    
asked by anonymous 11.09.2017 / 21:35

1 answer

3

Assuming the file has a format similar to this:

>SEQUENCE 1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE 2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

I assume that what you want to remove are the lines with this format >SEQUENCE xxxx (or similar), beforehand I already tell you that I do not understand anything of this format, Wikipedia a little, but I think your goal is simple, if it really is just just read the line by line of the FASTA file.

arquivo = 'foo.dat'; # Seu arquivo "fasta"

f = open(arquivo, 'r') # Abre para leitura
lines = f.readlines() # Lê as linhas e separa em um vetor

relist = [] # cria um novo array para pegar somente as linhas de interesse

for line in lines:
    if line.find('>') != 0: # ignora as linhas que começam com >
        relist.append(line) 

print(relist) # Mostra o array no output

Now if what you want is to actually remove the first line, whatever, just use .pop(0) , like this:

arquivo = 'foo.dat';

f = open(arquivo, 'r')
lines = f.readlines() # Lê as linhas e separa em um vetor

firstLine = f.pop(0) #Remove a primeira linha

print(lines)

To make array into string ("text") just use str.join(array) , it should be like this for the first example:

''.join(relist)

And so for the second:

''.join(lines)
    
11.09.2017 / 22:04