Fastest way to determine if an integer's square root is an integer

I'm looking for the fastest way to determine if a long value is a perfect square (ie its square root is another integer):

  • I've done it the easy way, by using the built-in Math.sqrt() function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain.
  • Maintaining a lookup table is impratical (since there are about 231.5 integers whose square is less than 263).
  • Here is the very simple and straightforward way I'm doing it now:

    public final static boolean isPerfectSquare(long n)
    {
      if (n < 0)
        return false;
    
      long tst = (long)(Math.sqrt(n) + 0.5);
      return tst*tst == n;
    }
    

    Notes: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.


    A new solution posted by A. Rex has proven to be even faster. In a run over the first 1 billion integers, the solution only required 34% of the time that the original solution used. While the John Carmack hack is a little better for small values of n, the benefit compared to this solution is pretty small.

    Here is the A. Rex solution, converted to Java:

    private final static boolean isPerfectSquare(long n)
    {
      // Quickfail
      if( n < 0 || ((n&2) != 0) || ((n & 7) == 5) || ((n & 11) == 8) )
        return false;
      if( n == 0 )
        return true;
    
      // Check mod 255 = 3 * 5 * 17, for fun
      long y = n;
      y = (y & 0xffffffffL) + (y >> 32);
      y = (y & 0xffffL) + (y >> 16);
      y = (y & 0xffL) + ((y >> 8) & 0xffL) + (y >> 16);
      if( bad255[(int)y] )
          return false;
    
      // Divide out powers of 4 using binary search
      if((n & 0xffffffffL) == 0)
          n >>= 32;
      if((n & 0xffffL) == 0)
          n >>= 16;
      if((n & 0xffL) == 0)
          n >>= 8;
      if((n & 0xfL) == 0)
          n >>= 4;
      if((n & 0x3L) == 0)
          n >>= 2;
    
      if((n & 0x7L) != 1)
          return false;
    
      // Compute sqrt using something like Hensel's lemma
      long r, t, z;
      r = start[(int)((n >> 3) & 0x3ffL)];
      do {
        z = n - r * r;
        if( z == 0 )
          return true;
        if( z < 0 )
          return false;
        t = z & (-z);
        r += (z & t) >> 1;
        if( r > (t  >> 1) )
        r = t - r;
      } while( t <= (1L << 33) );
      return false;
    }
    
    private static boolean[] bad255 =
    {
       false,false,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,
       true ,true ,false,false,true ,true ,false,true ,false,true ,true ,true ,false,
       true ,true ,true ,true ,false,true ,true ,true ,false,true ,false,true ,true ,
       true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,false,
       true ,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,false,
       true ,false,true ,true ,false,false,true ,true ,true ,true ,true ,false,true ,
       true ,true ,true ,false,true ,true ,false,false,true ,true ,true ,true ,true ,
       true ,true ,true ,false,true ,true ,true ,true ,true ,false,true ,true ,true ,
       true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,false,true ,
       true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,true ,true ,
       true ,true ,true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,
       true ,false,false,true ,true ,true ,true ,true ,false,true ,true ,false,true ,
       true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,true ,
       false,true ,false,true ,true ,false,true ,true ,true ,true ,true ,true ,true ,
       true ,true ,true ,true ,false,true ,true ,false,true ,true ,true ,true ,true ,
       false,false,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,
       true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,false,
       true ,true ,true ,true ,false,true ,true ,true ,false,true ,true ,true ,true ,
       false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,
       true ,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,false,
       true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,false,true ,
       true ,false,true ,false,true ,true ,true ,false,true ,true ,true ,true ,false,
       true ,true ,true ,false,true ,false,true ,true ,true ,true ,true ,true ,true ,
       true ,true ,true ,true ,true ,false,true ,false,true ,true ,true ,false,true ,
       true ,true ,true ,false,true ,true ,true ,false,true ,false,true ,true ,false,
       false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,false,true ,
       true ,false,false,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,
       true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,true ,true ,
       true ,true ,false,true ,true ,true ,false,true ,true ,true ,true ,false,false,
       true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,
       false,false,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,
       true ,true ,true ,false,true ,true ,false,true ,true ,true ,true ,true ,true ,
       true ,true ,true ,true ,true ,false,true ,true ,false,true ,false,true ,true ,
       false,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,
       true ,true ,false,true ,true ,true ,true ,true ,false,false,true ,true ,true ,
       true ,true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,true ,
       true ,true ,true ,true ,true ,true ,false,false,true ,true ,true ,true ,false,
       true ,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,true ,
       true ,false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,true ,
       true ,true ,true ,false,false
    };
    
    private static int[] start =
    {
      1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
      1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
      129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
      1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
      257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
      1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
      385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
      1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
      513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
      1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
      641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
      1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
      769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
      1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
      897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
      1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
      1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
      959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
      1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
      831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
      1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
      703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
      1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
      575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
      1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
      447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
      1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
      319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
      1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
      191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
      1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
      63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
      2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
      65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
      1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
      193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
      1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
      321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
      1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
      449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
      1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
      577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
      1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
      705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
      1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
      833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
      1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
      961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
      1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
      1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
      895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
      1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
      767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
      1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
      639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
      1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
      511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
      1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
      383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
      1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
      255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
      1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
      127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
      1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181
    };
    

    I've tried the different solutions presented below.

  • After exhaustive testing, I found that adding 0.5 to the result of Math.sqrt() is not necessary, at least not on my machine.
  • The John Carmack hack was faster, but it gave incorrect results starting at n=410881. However, as suggested by BobbyShaftoe, we can use the Carmack hack for n < 410881.
  • Newton's method was a good bit slower than Math.sqrt() . This is probably because Math.sqrt() uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles.
  • A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than Math.sqrt() .
  • Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
  • The one suggestion which did show improvements was made by John D. Cook. You can observe that the last hex digit (ie the last 4 bits) of a perfect square must be 0, 1, 4, or 9. This means that 75% of numbers can be immediately eliminated as possible squares. Implementing this solution resulted in about a 50% reduction in runtime.

    Working from John's suggestion, I investigated properties of the last n bits of a perfect square. By analyzing the last 6 bits, I found that only 12 out of 64 values are possible for the last 6 bits. This means 81% of values can be eliminated without using any math. Implementing this solution gave an additional 8% reduction in runtime (compared to my original algorithm). Analyzing more than 6 bits results in a list of possible ending bits which is too large to be practical.

    Here is the code that I have used, which runs in 42% of the time required by the original algorithm (based on a run over the first 100 million integers). For values of n less than 410881, it runs in only 29% of the time required by the original algorithm.

    private final static boolean isPerfectSquare(long n)
    {
      if (n < 0)
        return false;
    
      switch((int)(n & 0x3F))
      {
      case 0x00: case 0x01: case 0x04: case 0x09: case 0x10: case 0x11:
      case 0x19: case 0x21: case 0x24: case 0x29: case 0x31: case 0x39:
        long sqrt;
        if(n < 410881L)
        {
          //John Carmack hack, converted to Java.
          // See: http://www.codemaestro.com/reviews/9
          int i;
          float x2, y;
    
          x2 = n * 0.5F;
          y  = n;
          i  = Float.floatToRawIntBits(y);
          i  = 0x5f3759df - ( i >> 1 );
          y  = Float.intBitsToFloat(i);
          y  = y * ( 1.5F - ( x2 * y * y ) );
    
          sqrt = (long)(1.0F/y);
        }
        else
        {
          //Carmack hack gives incorrect answer for n >= 410881.
          sqrt = (long)Math.sqrt(n);
        }
        return sqrt*sqrt == n;
    
      default:
        return false;
      }
    }
    

    Notes :

  • According to John's tests, using or statements is faster in C++ than using a switch , but in Java and C# there appears to be no difference between or and switch .
  • I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or or statement, I would just say if(lookup[(int)(n&0x3F)]) { test } else return false; . To my surprise, this was (just slightly) slower. I'm not sure why. This is because array bounds are checked in Java.

  • I figured out a method that works ~35% faster than your 6bits+Carmack+sqrt code, at least with my CPU (x86) and programming language (C/C++). Your results may vary, especially because I don't know how the Java factor will play out.

    My approach is threefold:

  • First, filter out obvious answers. This includes negative numbers and looking at the last 4 bits. (I found looking at the last six didn't help.) I also answer yes for 0. (In reading the code below, note that my input is int64 x .)
    if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
        return false;
    if( x == 0 )
        return true;
  • Next, check if it's a square modulo 255 = 3 * 5 * 17. Because that's a product of three distinct primes, only about 1/8 of the residues mod 255 are squares. However, in my experience, calling the modulo operator (%) costs more than the benefit one gets, so I use bit tricks involving 255 = 2^8-1 to compute the residue. (For better or worse, I am not using the trick of reading individual bytes out of a word, only bitwise-and and shifts.)
    int64 y = x;
    y = (y & 4294967295LL) + (y >> 32); 
    y = (y & 65535) + (y >> 16);
    y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
    // At this point, y is between 0 and 511.  More code can reduce it farther.
    
    To actually check if the residue is a square, I look up the answer in a precomputed table.
    if( bad255[y] )
        return false;
    // However, I just use a table of size 512
    
  • Finally, try to compute the square root using a method similar to Hensel's lemma. (I don't think it's applicable directly, but it works with some modifications.) Before doing that, I divide out all powers of 2 with a binary search:
    if((x & 4294967295LL) == 0)
        x >>= 32;
    if((x & 65535) == 0)
        x >>= 16;
    if((x & 255) == 0)
        x >>= 8;
    if((x & 15) == 0)
        x >>= 4;
    if((x & 3) == 0)
        x >>= 2;
    At this point, for our number to be a square, it must be 1 mod 8.
    if((x & 7) != 1)
        return false;
    The basic structure of Hensel's lemma is the following. (Note: untested code; if it doesn't work, try t=2 or 8.)
    int64 t = 4, r = 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    // Repeat until t is 2^33 or so.  Use a loop if you want.
    The idea is that at each iteration, you add one bit onto r, the "current" square root of x; each square root is accurate modulo a larger and larger power of 2, namely t/2. At the end, r and t/2-r will be square roots of x modulo t/2. (Note that if r is a square root of x, then so is -r. This is true even modulo numbers, but beware, modulo some numbers, things can have even more than 2 square roots; notably, this includes powers of 2.) Because our actual square root is less than 2^32, at that point we can actually just check if r or t/2-r are real square roots. In my actual code, I use the following modified loop:
    int64 r, t, z;
    r = start[(x >> 3) & 1023];
    do {
        z = x - r * r;
        if( z == 0 )
            return true;
        if( z < 0 )
            return false;
        t = z & (-z);
        r += (z & t) >> 1;
        if( r > (t >> 1) )
            r = t - r;
    } while( t <= (1LL << 33) );
    The speedup here is obtained in three ways: precomputed start value (equivalent to ~10 iterations of the loop), earlier exit of the loop, and skipping some t values. For the last part, I look at z = r - x * x , and set t to be the largest power of 2 dividing z with a bit trick. This allows me to skip t values that wouldn't have affected the value of r anyway. The precomputed start value in my case picks out the "smallest positive" square root modulo 8192.
  • Even if this code doesn't work faster for you, I hope you enjoy some of the ideas it contains. Complete, tested code follows, including the precomputed tables.

    typedef signed long long int int64;
    
    int start[1024] =
    {1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
    1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
    129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
    1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
    257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
    1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
    385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
    1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
    513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
    1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
    641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
    1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
    769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
    1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
    897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
    1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
    1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
    959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
    1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
    831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
    1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
    703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
    1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
    575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
    1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
    447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
    1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
    319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
    1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
    191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
    1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
    63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
    2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
    65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
    1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
    193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
    1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
    321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
    1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
    449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
    1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
    577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
    1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
    705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
    1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
    833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
    1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
    961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
    1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
    1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
    895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
    1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
    767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
    1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
    639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
    1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
    511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
    1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
    383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
    1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
    255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
    1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
    127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
    1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181};
    
    bool bad255[512] =
    {0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
     1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
     0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
     1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
     1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
     1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
     1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
     1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
     0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
     1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
     0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
     1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
     1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
     1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
     1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
     1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
     0,0};
    
    inline bool square( int64 x ) {
        // Quickfail
        if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
            return false;
        if( x == 0 )
            return true;
    
        // Check mod 255 = 3 * 5 * 17, for fun
        int64 y = x;
        y = (y & 4294967295LL) + (y >> 32);
        y = (y & 65535) + (y >> 16);
        y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
        if( bad255[y] )
            return false;
    
        // Divide out powers of 4 using binary search
        if((x & 4294967295LL) == 0)
            x >>= 32;
        if((x & 65535) == 0)
            x >>= 16;
        if((x & 255) == 0)
            x >>= 8;
        if((x & 15) == 0)
            x >>= 4;
        if((x & 3) == 0)
            x >>= 2;
    
        if((x & 7) != 1)
            return false;
    
        // Compute sqrt using something like Hensel's lemma
        int64 r, t, z;
        r = start[(x >> 3) & 1023];
        do {
            z = x - r * r;
            if( z == 0 )
                return true;
            if( z < 0 )
                return false;
            t = z & (-z);
            r += (z & t) >> 1;
            if( r > (t  >> 1) )
                r = t - r;
        } while( t <= (1LL << 33) );
    
        return false;
    }

    I'm pretty late to the party, but I hope to provide a better answer; shorter and (assuming my benchmark is correct) also much faster.

    long goodMask; // 0xC840C04048404040 computed below
    {
        for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
    }
    
    public boolean isSquare(long x) {
        // This tests if the 6 least significant bits are right.
        // Moving the to be tested bit to the highest position saves us masking.
        if (goodMask << x >= 0) return false;
        final int numberOfTrailingZeros = Long.numberOfTrailingZeros(x);
        // Each square ends with an even number of zeros.
        if ((numberOfTrailingZeros & 1) != 0) return false;
        x >>= numberOfTrailingZeros;
        // Now x is either 0 or odd.
        // In binary each odd square ends with 001.
        // Postpone the sign test until now; handle zero in the branch.
        if ((x&7) != 1 | x <= 0) return x == 0;
        // Do it in the classical way.
        // The correctness is not trivial as the conversion from long to double is lossy!
        final long tst = (long) Math.sqrt(x);
        return tst * tst == x;
    }
    

    The first test catches most non-squares quickly. It uses a 64-item table packed in a long, so there's no array access cost (indirection and bounds checks). For a uniformly random long , there's a 81.25% probability of ending here.

    The second test catches all numbers having an odd number of twos in their factorization. The method Long.numberOfTrailingZeros is very fast as it gets JIT-ed into a single i86 instruction.

    After dropping the trailing zeros, the third test handles numbers ending with 011, 101, or 111 in binary, which are no perfect squares. It also cares about negative numbers and also handles 0.

    The final test falls back to double arithmetic. As double has only 53 bits mantissa, the conversion from long to double includes rounding for big values. Nonetheless, the test is correct (unless the proof is wrong).

    Trying to incorporate the mod255 idea wasn't successful.


    You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.

    Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.

    Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.

    public final static boolean isPerfectSquare(long n)
    {
        if (n < 0)
            return false;
    
        switch((int)(n & 0xF))
        {
        case 0: case 1: case 4: case 9:
            long tst = (long)Math.sqrt(n);
            return tst*tst == n;
    
        default:
            return false;
        }
    }
    

    When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.

    int isPerfectSquare(int n)
    {
        int h = n & 0xF;  // h is the last hex "digit"
        if (h > 9)
            return 0;
        // Use lazy evaluation to jump out of the if statement as soon as possible
        if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
        {
            int t = (int) floor( sqrt((double) n) + 0.5 );
            return t*t == n;
        }
        return 0;
    }
    

    Eliminating the switch statement had little effect on the C# code.

    链接地址: http://www.djcxy.com/p/86592.html

    上一篇: '开关'比'如果'更快?

    下一篇: 确定整数的平方根是否为整数的最快方法