Permutations and files

3

I'm working on a joke that involves permutations, to create a list of words, in several separate files. Problems start appearing when words with more than 4 letters are required, because the corresponding file becomes too heavy. What I wanted was that every time a given file arrived at 499 999 (for example) it opened another file with the same name but with an extension. For example 4-character words " wl4_1.txt , wl4_2.txt ..." where " wl4_2.txt " is the continuation of " wl4_1.txt ", and the first line of this (" wl4_2.txt ") would be the line of 500 000 of " wl4_1.txt ", that is, we would close " wl4_1.txt " and we would continue our list in " wl4_2.txt ".

This code below works beautifully, I just wanted to add this functionality that I explained. In this example are 3 letter words and is already heavy (830 584 lines):

import itertools
import string

def main():

    alphabet = string.letters + string.digits + string.punctuation
    alphaLen = len(alphabet)

    print alphabet
    for i in range(3):
        NumToPerm = i+1 #remove 0 from the permutations function
        fileTest = open("word_lists/wl" +str(NumToPerm)+ ".txt", "w")

        perm(fileTest, alphabet, NumToPerm)

def perm(fileTest, alphabet, NumToPerm):

        for p in itertools.product(alphabet, repeat=NumToPerm):

            word = str(p)

            for char in word:
               if char in " (),'":
                  word = word.replace(char,'')

            fileTest.write(word+ '\n')

        fileTest.close()

main()
    
asked by anonymous 14.03.2015 / 10:15

2 answers

2

First, you do not need to call perm from within get_file_extn : just return to calling function! If you call perm from there, you will be starting from scratch, which is not what you want (and continue from where it is).

Just remember to return the new fileTest , since the previous one was closed (if you tried to use it, it would give an error). And do not close it twice!

def get_file_extn(fileTest, alphabet, NumToPerm, countExtn):
    #fileTest.close()
    fileTest = open("word_lists/wl" +str(NumToPerm)+ "_" +str(countExtn)+ ".txt", "w")
    return fileTest

def perm(fileTest, alphabet, NumToPerm):
    ...
    if countWords == 999:
        fileTest.close()
        fileTest = get_file_extn(fileTest, alphabet, NumToPerm, countExtn)
        countExtn = countExtn + 1

Second, countWords has reached 999 , but if you do not reset it to zero, it will pass 1000 and keep growing - without ever entering if again! As the next line will increase it by 1 , set it to zero at the end of if :

    if countWords == 999:
        fileTest.close()
        get_file_extn(fileTest, alphabet, NumToPerm, countExtn)
        countExtn = countExtn + 1
        countWords = 0

    countWords = countWords + 1
    ...

Third, you are assigning countExtn to 1 within the for loop. For every word read, it will be 1 ! Instead, assign it before for :

    countExtn = 1
    for p in itertools.product(alphabet, repeat=NumToPerm):

With this you get the separation in files. One last detail: the first opened file, in main , did not use its wl_X_Y.txt convention, but simply wl_X.txt . And it will get the first 999 words, while wl_X_1.txt will receive from the thousandth onwards. It would be preferable for main to create file 1, and perm already started countExtn with 2 (since 1 has already been created):

def main():
    ...
    fileTest = open("word_lists/wl" +str(NumToPerm)+ "_1.txt", "w")

    perm(fileTest, alphabet, NumToPerm)  

def perm(fileTest, alphabet, NumToPerm):
    ...
    countExtn = 2
    for p in itertools.product(alphabet, repeat=NumToPerm):
    
14.03.2015 / 13:35
0

The proposal I am introducing slightly changes the question approach:

  • Simplify the script to only generate permutations of the length received via the command line, sending to stdout .
  • The operating system splits into files ( split command).

that is:

import itertools
import string
import sys

def main():
    alphabet = string.letters + string.digits + string.punctuation
    perm(alphabet, int(sys.argv[1]))

def perm(alphabet, NumToPerm):
    for p in itertools.product(alphabet, repeat=NumToPerm):
        print  "".join(p)

main()

How to use it:

python x.py 3 | split -d --additional-suffix=.txt  -l 50000 - wl
    
17.03.2015 / 18:06