Using Python to Extract Equations, Figures, and Other LaTeX File Items

1

I'm on a personal project that involves leaving an article written in LaTeX as clean as possible for sending the translation.

In order to increase my productivity, avoid problems of information leakage and make it easier for the translator, who is not very familiar with this type of writing, I decided that I would send the original "cleaner" text without the equations for example. Then I can extend the same concept to figures, etc.

Problem : Once you've extracted what you want (equations for example), two new files are saved, one containing only the equations and one clear, without the equations. I've assembled the code below, which already works. The challenge is: (1) With each extraction, a reference must be left to return the same equation in the same place it was when the translated text was received; (2) To return the original equations, there must be another script for this purpose.

Any strategy suggestions to better address this challenge?

Follow the current code working, but do not fulfill items (1) and (2) above.

print('Inicio do Script')
infileName = open('document.tex','r')
inOrig = infileName.readlines()
outfileName_eq = open('document_equacoes.tex','w')
outfileName_tex = open('document_limpo.tex','w')
extract_block = False
oneWrite = False
lista = [['begin{equation}', 'end{equation}'],\
         ['begin{equation*}', 'end{equation*}'],\
         ['begin{eqnarray}', 'end{eqnarray}'],\
         ['begin{eqnarray*}', 'end{eqnarray*}'],\
         ['begin{align}', 'end{align}'],\
         ['begin{align*}', 'end{align*}']]
for list in lista:
    print('Examinando '+ list[0] + ' e ', list[1])
    for line in inOrig:
        if list[0] in line:
            extract_block = True
        if extract_block:
            outfileName_eq.write(line)
        if list[1] in line:
            extract_block = False
            outfileName_eq.write("%------------------------------------------\n\n")

#separado para melhor entendimento do funcionamento    
for line in inOrig:
    for list in lista:
        if list[0] in line:
            extract_block = True            
            oneWrite = True
        if list[1] in line:
            extract_block = False
            oneWrite = True
    if not (extract_block or oneWrite):
        outfileName_tex.write(line)
        oneWrite = True
    oneWrite = False

infileName.close()
outfileName_eq.close()
outfileName_tex.close()
print('Fim do Script')

The document in LaTeX that I used for testing is the following, which to match the code above, should be saved as "document.tex"

\documentclass{article}
\usepackage[utf8]{inputenc} % Disponibiliza acentos.
\usepackage[english,brazil]{babel}
\usepackage{lipsum}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\title{Titulo do Artigo}
\author{Nome do Autor}
\begin{document}
\maketitle
\begin{abstract}
    \lipsum[1]
\end{abstract}
    \section{Primeira Seção}
    \lipsum[2-4]
\section[Exemplo de Fórmula]{Fórmula}
Neste trecho existe um exemplo de como aparece geralmente 
uma equação. A primeira equação é a de Báskara, conforme (\ref{eq:bask01})
\begin{equation}
    x = \frac{-b \pm \sqrt{b^2-4ac}}{2a}
    \label{eq:bask01}
\end{equation}
Outra forma de expressar as fórmulas que também precisam ser verificadas abaixo
\begin{eqnarray*}
    x =& a^b\
    y =& h^{\pi.r}
\end{eqnarray*}
A seguinte é muito parecida com a de Báskara e pode ser visto em (\ref{eq:bask02}), no entanto não existe em literatura.
\begin{equation}
    x = \frac{-b/2 \pm \sqrt{b^2-4ac}}{2acb}
    \label{eq:bask02}
\end{equation}
Outra forma de expressar as fórmulas que também precisam ser verificada
\begin{eqnarray}
    x &=& a^b\
    y &=& h^{\pi.r}
\end{eqnarray}
Finalmente outro método com $k_2$ tal como (\ref{eq:seqEq})
\begin{align}
    k_1&= s^2\
    k_2&= k^2 \label{eq:seqEq}
\end{align}
Fim das descrições gerais
\end{document}
    
asked by anonymous 01.10.2018 / 00:03

0 answers