Depends on the Data Pattern
You can use a regular expression based on the example data you have placed, but it is tricky to know if it will work for all rows because you do not have a pattern.
The files are either delimited by one character or are delimited by the number of characters for each column. In your case, it does not follow either a standard or another.
I made an example in Java using an expression that works for your example line:
String REGEX = "\s([\dZ]+)\s";
String INPUT = "0 02 020 0201 020110 Z DEMONSTRAR COMPETÊNCIAS PESSOAIS 1 Primar pela correção de atitudes";
String REPLACE = " ;$1 ;";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
Need to import in Java:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Result:
0 ;02 ;020 ;0201 ;020110 ;Z ;DEMONSTRAR COMPETÊNCIAS PESSOAIS ;1 ;Primar pela correção de atitudes
You can use RegExr to test with more sample lines and adapt to your needs. The Notepad ++ also makes find+replace
with regular expression, in case you have to do this operation only once for the file.
A Python converter
Anthony's response has been posted to make the file parser in Java and I believe it is the best answer for the problem. As I had downloaded the file and suggested in the comment for you to separate the file into two parts, I decided to leave an example in Python to do as I had suggested.
import re
line_count = 1
with open('C:\temp\CBO2002 - PerfilOcupacional.csv', 'w') as w:
with open('C:\temp\CBO2002 - PerfilOcupacional.txt') as r:
for line in r:
if (line_count == 1):
# parse do cabecalho
header = re.sub(r"([\w_]+)\s*", r";", line)
w.write(header + '\n')
elif (line_count > 2):
# descarta a linha 2 e
# divide em dois grupos que tem padrao definido
side_a = line[0:22]
side_b = line[23:]
# faz o parse de cada grupo
parse_side_a = re.sub(r"(\d)\s([\d|\w])", r";", side_a)
parse_side_b = re.sub(r"([^\d]+)\s(\d+)\s(.+)", r";;", side_b)
# junta os dois grupos (o CRLF ja esta no grupo B)
line_out = parse_side_a + ';' + parse_side_b
w.write(line_out)
line_count += 1