If possible, how can I work with integer values with 1 million digits or more in python?

4

I'm running some experiments with prime numbers and needed to process huge integers, however when trying to process a small routine an error occurs in the following line:

  OverflowError: int too large to convert to float

How can I fix this error?

Is it possible to work with extremely long numbers in python?

    
asked by anonymous 14.09.2018 / 13:20

1 answer

6

Yes - it is possible - Whole numbers in Python has no size limit. I would not know though if the internal implementation for operations with these numbers will be efficient enough to do multiple operations with those numbers.

The mistake you are having is that somewhere in your accounts you made an operation that resulted in a floating-point number. In Python 3, a division, for example, does this - Floating-point numbers are limited to the Float 64 pattern ( IEEE 754 -2008 ), (or another type, depending on the CPU architecture). These numbers will lose precision - that is, discard digits for any number greater than 2 ** 53 - which correspond to about 16 decimal digits (16, not 16000).

To "play some operations," you can certainly use the native Python integers - just be careful to use the "whole split", which uses the double-co_de% operator instead of // which converts the result to float (and of course, the / module works well with large integers).

If you are doing serious research, there may be some large integer library that uses the GPU efficiently - and has links to Python

(google google)

Here - link - does not use the GPU, but uses arbitrary precision tip libraries, with Python binding, ensuring that you will be able to use the maximum performance of your CPU and still experience from the interactive mode in Python or Jupyter notebook. The only reference to such a library using the GPU that I find is called "gpump", but I did not find any instructions on how to install it, much less something about being able to use it from Python (and anyway, it's just for hardware Nvidia).

Below are some examples in an interactive ipython section, creating and manipulating a 2 million digit number. (I use the % extension of Ipython to display the time that python takes to compute an expression):

In [2]: %time a = int('7' * 1_000_000)
CPU times: user 4.5 s, sys: 2.21 ms, total: 4.5 s
Wall time: 4.5 s

In [3]: %time a = int('7' * 2_000_000)
CPU times: user 17.9 s, sys: 10.9 ms, total: 17.9 s
Wall time: 17.9 s

In [4]: b = a // 3

In [5]: %time b = a // 3
CPU times: user 10.7 ms, sys: 979 µs, total: 11.7 ms
Wall time: 11.9 ms

In [6]: %time c = a % 33333333333333333333333333333333333
CPU times: user 19.2 ms, sys: 1.96 ms, total: 21.2 ms
Wall time: 21.3 ms

In [7]: c
Out[7]: 11111888888888888888888888888888888

In [8]: d = a / 4
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-8-a0e9c689eb57> in <module>()
----> 1 d = a / 4

OverflowError: integer division result too large for a float

(Note also that since Python 3.7 we can use %time as a separator of digits in integers - it is easy to write 2 million as _ )

Also - the time to create and convert a number of 1 million digits in a string to an integer of that size was 4.5 seconds - from 2 million digits, jumped to 17.9 seconds - which indicates that this conversion is an algorithm not linear, and something like 3 or 4 million digits may already be unfeasible to be created that way - but the numerical operations with those numbers, once created, are quite fast.

I ended up installing gmpy2 here - on Linux there may be packages for your distribution. As I use Python with virtualenv and install each package locally, to install in fedora I needed to install the 2_000_000 packages of the binary libraries that it uses (In Ubuntu / Debian they would be devel packages, installed as -dev command) - these commands in the shell:

sudo dnf install mpfr-devel mpc-devel
pip install gmpy2

If I were to install in Python's system, it would simply be: apt . In the above link there are instructions for installation on other operating systems.

Here are the operations equivalent to those I did with native Python integers and their times:

In [13]: from gmpy2 import mpz

In [14]: %time a = mpz('7' * 2_000_000)
CPU times: user 111 ms, sys: 7.77 ms, total: 118 ms
Wall time: 119 ms

In [15]: %time b = a // 4
CPU times: user 1.65 ms, sys: 0 ns, total: 1.65 ms
Wall time: 1.69 ms

In [16]: %time c = a % 33333333333333333333333333333333333333333333
CPU times: user 5.24 ms, sys: 54 µs, total: 5.29 ms
Wall time: 5.37 ms

In [17]: d = a / 3333333333333

In [18]: type(d)
Out[18]: mpfr

That is: building an integer from string dropped from 18 seconds to 120 milliseconds - about 100 X faster. And the division fell from 11.9 milliseconds to 1.7 milliseconds - about 5 times faster. In addition, gmpy2 also includes arbitrary precision decimal types (such as Python's Decimal.Decimal) - with automatic conversion in the division - see the last example, the numeric type becomes the "mpfr" class.

PS. I used google to answer the final part of the question, but it's not such a trivial search - you have to know what to look for - by no means meant "you could have used Google instead of asking."

    
14.09.2018 / 14:40