Algorithm to distribute entries in an encyclopedia based on number of characters per day

12

I have generated a list of entries in an encyclopedia and I want to find the best way to separate the entries so that I choose a number of days and I have a schedule of which entries I read per day.

(The list is here: link )

The first item is the volume, the second the entry, the third the number of characters and the fourth is an id (generated from the date the entry was saved on my pc).

I would like to make a script that would generate a schedule, I would enter the number of days and it would divide the total number of characters of the encyclopedia and separate the entries from the average number of characters.

Example: I enter with the number of days, 120. The script will generate a schedule so that I can read all the characters of the three volumes of the encyclopedia in 120 days. For example:

The schedule should be 120 days, and each day will have an X number of entries having their number of characters added equaling a number as close to the average number of characters per day.

The entries can be read in any order and on any volume, what matters is that they are distributed so that an almost equal amount of characters are read each day.

I have asked a previous question related to this and a user suggested the prediction error algorithm, but I am not able to adapt it to this script.

See: What algorithm to distribute paragraphs ?

How could I do this?

    
asked by anonymous 12.05.2017 / 22:24

3 answers

2

I did not read the other answers, and I do not know "smart" way to solve your question. That said, I found the problem interesting, and tried to solve "my way" (nail).

I suppose you do not want / can break an entry in the middle - otherwise you could make a "perfect" split and you would not have asked a question.

I say this because I noticed that you have entries with 70 characters - and others with 7000, which makes it difficult to believe a good distribution.

What the code I've written is meant to do, is this:

  

  • Fill in the desired number of days with the largest entries found
  •   
  • Search for days whose character summation is furthest from the average
  •   
  • Add new entries to these days
  • And that's it. In my tests, the result seems reasonable, but I confess that I caught so much of the programming itself that I could not test the possible solutions a lot!

      

    The program expects a text file, with formatting identical to the one you posted in pastebin - information for each entry separated by semicolon , and entries separated from each other by a new line :

    vol1;Ageu, Livro de;4629;1494553563.48
    vol1;Agricultura;6593;1494553566.18
    vol1;Água;8947;1494553571.57
    vol1;Águia;10794;1494553577.0
    vol1;Aguilhada;1688;1494553582.39
    vol1;Agulha;580;1494553585.11
    vol1;Agulha, Orifício da;1418;1494553587.82
    
      

    Result:   

    "use strict";
    function mostraDados() {
    	if (window.carregado) {
    		document.getElementById("qtdcaracteres").value = window.qtdcaracteres;
    		document.getElementById("qtdverbetes").value = window.qtdverbetes;
    		if (document.getElementById("qtddias").value > window.qtdverbetes) {
    			document.getElementById("qtddias").value = window.qtdverbetes;
    		}	
    		var qtddias = document.getElementById("qtddias").value;
    		document.getElementById("qtdmedia").value = Math.round(window.qtdcaracteres/qtddias);
    		window.qtdmedia = Math.round(window.qtdcaracteres/qtddias);
    	}
    }
    
    function saida(div, texto) {
    	div.innerHTML = div.innerHTML + texto;	
    }
    
    function limpa(div) {
    	div.innerHTML = "";	
    }
    function criaVerbetes(meuArray) {
    		var obj = new Array();
    		for (var i = 0; i < meuArray.length; i++) {
    			obj[i] = {
    				vol: (meuArray[i].split(";"))[0],
    				titulo: (meuArray[i].split(";"))[1],
    				caracteres: (meuArray[i].split(";"))[2],
    				id: (meuArray[i].split(";"))[3],
    				pegaCaracteres: function() { return parseInt(this.caracteres); },
    			}
    		}
    	return obj;
    }
    
    function contaCaracteres(meuArray) {
    		var qtdCaracteres = 0;
    		for (var i = 0; i < meuArray.length; i++) {
    			qtdCaracteres += meuArray[i].pegaCaracteres();
    		}
    	return qtdCaracteres;
    }
    
    function ordena(meuArray) {
    		meuArray.sort(function(a, b) {
    			return a.pegaCaracteres() - b.pegaCaracteres();
    		});
    }
    
    function ordenaId(meuArray) {
    		meuArray.sort(function(a, b) {
    			return a.id - b.id;
    		});
    }
    
    function listaVerbetes(meuArray) {
    	for (var i = 0; i < meuArray.length; i++) {
    		console.log(meuArray[i].titulo + ": " + meuArray[i].pegaCaracteres());
    	}
    }
    
    function criaArrayDeDias() {
    	var dias = document.getElementById("qtddias").value;
    	var meuArray = new Array();
    	
    	for (var i = 0; i < dias; i++) {
    		meuArray[i] = {
    			verbete: [],
    			pegaSoma: function() {
    				var ct=0;
    				for (var i = 0; i < this.verbete.length; i++) {
    					ct += this.verbete[i].pegaCaracteres();
    				}
    				return ct;
    			}
    		}
    	}
    
    	return meuArray;
    }
    
    function calcula() {
    	if (window.carregado) {
    		window.objDia = criaArrayDeDias();
    		window.colecaoVerbetes = criaVerbetes(arrayVerbetes);
        
                    ordena(colecaoVerbetes);
    
    		for (var i = 0; i < window.objDia.length; i++) {
    			window.objDia[i].verbete[window.objDia[i].verbete.length] = window.colecaoVerbetes.pop();
    		}
    		
    		var longeDaMedia = 0;
    		var index;
    		while(window.colecaoVerbetes.length > 0) {
    			for (var i = 0; i < window.objDia.length; i++) {
    			var provisorio = window.qtdmedia - window.objDia[i].pegaSoma();
    				if (longeDaMedia < provisorio) {
    					longeDaMedia = provisorio;
    					index = i;
    				}			
    			}
    			window.objDia[index].
    				verbete[window.objDia[index].
    					verbete.length] = window.colecaoVerbetes.pop();
    			longeDaMedia = 0;
    		}
    
    		// saiu do while! hora de morfar
    		var div = document.getElementById('saida');
    		var div2 = document.getElementById('saida2');
    		for (var i = 0; i < window.objDia.length; i++) {
    				ordenaId(window.objDia[i].verbete);				
    		}
    
    		limpa(div);
    		for (var z = 0; z < window.objDia.length; z++) {
    			saida(div, "Dia " + z + ":<br>");
    			saida(div, "soma de caracteres: " + window.objDia[z].pegaSoma() + "<br>");
    			saida(div, "numero de verbetes: " + window.objDia[z].verbete.length + "<br><br>");
    		}
    
    
    		var div = document.getElementById('saida');
    		limpa(div2);
    				for (var z = 0; z < window.objDia.length; z++) {
    			saida(div2, "Dia " + z + ":<br>");
    			for (var g = 0; g < window.objDia[z].verbete.length; g++) {
    				saida(div2, window.objDia[z].verbete[g].vol + ", " +
    					window.objDia[z].verbete[g].titulo + "<br>");
    			}
    			saida(div2, "<br>");
    		}
    
    	} else { console.log("nao carregado"); }
    }
    
    function leArquivo(e) {
    	var arquivo = e.target.files[0];
    	var leitor = new FileReader();
    	leitor.onload = function(e) {
    		window.arquivoTexto = e.target.result;
    		window.arrayVerbetes = arquivoTexto.split("\n");
    		window.carregado = true;
    		window.objDia = criaArrayDeDias();
    		window.objVerbetes = criaVerbetes(arrayVerbetes);
    		window.colecaoVerbetes = window.objVerbetes;
    		window.qtdverbetes = window.objVerbetes.length;
    		window.qtdcaracteres = contaCaracteres(objVerbetes);
    		mostraDados();
    	};
    leitor.readAsText(arquivo);
    }
    
    document.getElementById('arquivo').addEventListener('change', leArquivo, false);
    document.getElementById('qtddias').addEventListener('change', mostraDados, false);
    label {display:block;}
    textarea {display:block;}
    
    .container{
        font-family: Tahoma, Verdana, Segoe, sans-serif;
        padding: 10px;
        background-color:#d6d6c2;
        display:flex;
    }
    .container2{
        color: #000;
        height:400px;
        overflow-y:scroll;
    }
    .divide{
        padding: 10px;
        flex-grow: 1;
    }
    <!DOCTYPE html>
    <html>
    <body> 
    <head>
    <meta charset="utf-8"/>
    </head>
    <body>
    <div class="container">
    	<div class="divide">
    			<label for="title">Quantidade de verbetes</label>
    			<textarea id="qtdverbetes" rows="2" cols="15" readonly></textarea>
    			<label for="title">Quantidade de caracteres</label>
    			<textarea id="qtdcaracteres" rows="2" cols="15" readonly></textarea>
    			<label for="title">Media caracteres/dia</label>
    			<textarea id="qtdmedia" rows="2" cols="15" readonly></textarea>
    		</div>
    		<div class="divide">
    			<input type="file" id="arquivo"/><hr>
    			<label for="title">Numero de dias</label>
    			<textarea id="qtddias" rows="2" cols="15"></textarea>
    			<hr>
    			<button onclick="calcula()">Calcular</button>
    		</div>
    		</div>
    	<div class="container">
    		<div class="divide">
    			<p>Contagem de caracteres:</p>
    			<div class="container2" id="saida"></div>
    		</div>
    		<div class="divide">
    			<p>Verbetes:</p>
    			<div class="container2" id="saida2"></div>
    		</div>
    <script src=".\verbetes.js">
    </script>
    </body>
    </html>
        
    22.05.2017 / 01:05
    0

    Hello,

    Well the entries can be read randomly? example on the same day read volume 1, 2 and 3 entries at random?

    If you can not, you can do something like the logic below:

    Variaves do script:
    
    somaDeCaracteres = soma dos caracteres de todas os verbetes
    dias = numero de dias desejados
    media = media de caracteres diarios
    numeroDeVerbetes = quantidade de verbetes totais
    caracteresRestantesNoDia = ?
    
    Logica: 
    
    1 - 
    Para cada verbete em verbetes{
        somaDeCaracteres = somaDeCaracteres + verbete.caracteres
        numeroDeVerbetes = numeroDeVerbetes + 1
    }
    
    2 - 
    
    media = somaDeCaracteres  dias
    
    3 - 
    caracteresRestantesNoDia = media
    diaAtual = 1
    Para cada verbete em verbetes{
        se caracteresRestantesNoDia - verbete.caracteres > 0 {
            adicionaVerbeteNaLista(diaAtual)
            caracteresRestantesNoDia = caracteresRestantesNoDia - verbete.caracteres
        }senao{
            diaAtual = diaAtual + 1
            caracteresRestantesNoDia = media
        }
    }
    

    So he respects the media, in case an entry more than once exceeds the average it does not add ... so the last day will have characters less than the others.

        
    17.05.2017 / 03:59
    0

    Solving this problem is not difficult, the problem is feasibility due to computational cost.

    A simple way to solve: create an algorithm that generates all the possible combinations and then check which varies less with respect to the mean, which can be done for example, verifying which has the least quadratic deviation , which is easy.

    The difficulty of this problem is due to its complexity that grows exponentially, that is, its computational cost increases greatly by increasing the size of the problem.

    I suggest that you see this video that explains this very well (although it is very understandable in Spanish). link

    Of course, in the solution I gave, you can simplify some things so you do not have to test all the combinations and find a better solution, but the solution will still be exponentially complex.

    With 120 days and 4321 entries, the time to find the answer will be absurdly large, but it is the only way to ensure the best distribution has been found.

    What can be done is to develop an idea for an ideal solution , where you will not find the best distribution, but perhaps the result is satisfactory for your purpose, but this depends on your goal. If it was only the best solution, unfortunately there is not much to do, but if a non-ideal solution is also valid we can try some ideas. The question that remains is what is your goal?

        
    19.05.2017 / 20:13