Chess IQ Test
Compiled by Jim Monaghan
I. Tactical IQ Test for Chess Programs
This is a test suite from the master section of Livshits' book "Test Your Chess IQ". There are 360 positions that are carefully balanced with medium to hard examples.
The test as presented here is intended to estimate the tactical strength of chess engines. The test duration is one hour. Run the test on a chess engine at 10 seconds per position, note the percentage score achieved and compare it with the table below:
Percent IQ Elo Correct
Correct Solutions
100
2764 360
90 2644
324
80 2524
288
70 2404
252
60 2284
216
50 2164
180
40 2044
144
30 1924
108
20 1804
72
10 1684
36
0
1564 0
The floor for the test is 1564, the ceiling is 2764. Note that each percentage point correct earns 12 "IQ Elo" points. So the rating formula is:
IQ elo = 1564 + (% correct x 12)
The test generates a tactical IQ rating for an engine on whatever hardware it is tested on. Strategical factors, and endgame knowledge are not considered in the test and are in no way measured. The ratings don't compare to "real" rating lists. It's just for fun, not too serious.
II. Comparison of Ply Depth verses Rating Performance
Working with average ply depth instead of time, makes the table below more universally applicable to different programs and hardware. Again there is nothing absolute about the rating numbers -- it's the differences that are important. By pumping the new IQ test suite through Yace 0.99.56 at ascending ply depths I came up with the following table:
Cel 1.3 Ghz/256 (32 MB HT)
Avg Depth Found
Percent IQ Elo
Rating Gain
(Plys) (Total=360)
Correct (Max=2764) (Difference)
5
159 44.17%
2094 ---
6
206 57.22%
2251 157
7
241 66.94%
2367 116
8
271 75.28%
2467 100
9
288 80.00%
2524 57
10
308 85.56%
2591 67
11
315 87.50%
2614 23
Notice in the Ply Depth table above that Yace solved 315/360 at 11 ply. Even after 15 minutes and how many ply, a ceiling is definitely reached. Some unique type of pruning extensions will be needed to pierce this barrier or a lot of speed.
The tree really explodes at 12 ply on a lot of these positions and would take a huge amount of time. I would expect very little gain in performance anyway.
Knowing (or assuming) that the relationship between ply depth and rating performance is a logarithmic function, I wonder if a mathematician can extrapolate this table working with columns 1 and 5. Hopefully there is enough of a trend. The rating gain going from ply 8 to ply 9 is either a little low at 57, or the gain from ply 9 to ply 10 is a little high at 67, or both. The chart could perhaps step a little better, but I'm reporting it as directly as it came out. Maybe the table can be smoothed out for projection purposes. What would the expected gain be going from ply 11 to 12, ply 12 to ply 13, etc? Is there a ceiling? In a theoretical sense, I guess there isn't a ceiling -- but practically when you consider time spent to achieve higher plys, there might as well be. Interesting...
III. Real Tough IQ
The following 16 positions are unsolved by Yace 0.99.75c after one hour
per
position on a P4 2.53:
6k1/1p3r1p/r2q1Pp1/2p1n1P1/p1PpP3/P2P2Q1/1P4B1/4RRK1 w - - bm Rf5; id
"IQ.1023";
5rk1/p1q1ppb1/3p2p1/3P1bBp/1pr4P/5PN1/PPPQ2P1/1KR4R b - - bm Bc3; id
"IQ.1027";
2rq1rk1/pp1bnpbp/4p1p1/3pP1N1/3P2Q1/2PB4/P4PPP/R1B1R1K1 w - - bm Nxh7;
id "IQ.1031";
r3qrk1/1b3p2/1p1npnp1/2b1N1N1/p1P5/P1B5/1PB1QP1P/R2R2K1 w - - bm Nd7;
id "IQ.1057";
r3k2r/2pn1pp1/p1p2qb1/Np2P1b1/6P1/3P2B1/PPP2P2/RN1Q1RK1 b kq - bm Nxe5;
id "IQ.1111";
r2r4/pp3p2/4bkpp/8/7P/3B1P2/PP4P1/1K1R3R b - - bm Rxd3; id "IQ.1149";
1rb1r1k1/3n1ppp/p1p1p3/q3P3/7P/2N1Q1R1/PPP3P1/2KR1B2 w - - bm Rxd7; id
"IQ.1182";
r2qnrk1/pp2ppb1/3p3B/2p5/7Q/1PNP3b/1PP3PP/R4RK1 w - - bm Ne4; id
"IQ.1191";
r2r2k1/pp3ppp/2p1bn2/7q/NbP3P1/1P2B2P/PQ3PB1/R4RK1 b - - bm Bxg4; id
"IQ.1194";
r1b2rk1/2q1bppp/p2pp3/2n3PQ/1p1BP3/1BN5/PPP2P1P/2KR2R1 w - - bm Bf6; id
"IQ.1198";
2rq1rk1/1b3pb1/5np1/1pn1p1Np/4P2Q/2N1B3/BPP3PP/R4R1K w - - bm Nxf7; id
"IQ.1226";
r4rk1/1bpq1ppp/3p1b2/2nP1N2/8/1p3Q1P/PPB2PP1/R1B1R1K1 w - - bm Nxg7; id
"IQ.1248";
rnb3kr/1p1nqppp/p3p3/2ppP3/3P1N2/2NB1Q2/PPP2PP1/R3K2R w KQ - bm Bxh7+;
id "IQ.1253";
2kr3r/pp1b1p2/1qn1pb2/2p5/4Q3/5N1B/PPP2PPP/R1B2RK1 b - - bm Rxh3; id
"IQ.1259";
2r2rk1/p1n3pp/1p2p3/1q1pQP2/3Pn3/6N1/PP4PP/R1B2RK1 w - - bm Bh6; id
"IQ.1270";
r2q1rk1/3nbpp1/4p2p/p1p1P3/1p1P3P/3B1b1R/PPQB1PP1/R3K3 w - - bm Bxh6;
id "IQ.1274";
I need to analyse these positions more to see if Yace is "seeing"
something that
the solutions have missed.
IV. "Rating List"
George Lyapko ran the old IQ test on his AMD K6-2/450 for most of the free Winboard programs back in Nov. 2001 using 10 sec/move, no opening book, and no EGTB's.
Program
Score % IQ Elo
Yace_09956 247 69 2387
Phalanx_XXII 245 68 2381
LG2000_30 241 67 2367
TCB_0045 228 63
2324
Bringer_18 226 63 2317
Gromit_300 225 63 2314
Crafty_1812 222 62 2304
Pepito_142 222 62 2304
Bionic_401 221 61 2301
Nejmet_260 221 61 2301
Zchess_222 219 61 2294
Glc_215c 219 61
2294
Pharaon_250 217 60 2287
Anmon_515 216 60 2284
Inmi_305 215 60
2281
Tao_44 214
59 2277
Amy_07c 213 59
2274
WildCat_261 210 58 2264
Comet_b37 209 58 2261
Exchess_402 209 58 2261
KingOfKings_200 205 57 2247
Bestia_083 204 57 2244
Terra_25 204 57
2244
Knightx_171a 201 56 2234
Ant_606 196 54
2217
Arasan_54 195 54 2214
Quark_150 193 54 2207
Gnuchess_414 192 53 2204
Fortress_162 190 53 2197
Beowulf_17 189 53 2194
LordKing_III 187 52 2187
Dragon_42 187 52 2187
Freyr_1067 184 51 2177
Queen_211 179 50 2161
Sjeng_11 178 49
2157
Ssechess_2045 178 49 2157
Gerbil_02 175 49 2147
Amyan_148 162 45 2104
Esc_104 161
45 2101
Olithink_305 156 43 2084
Tristram_416 154 43 2077
Ghost_v0_13 150 42 2064
Gully2_c 148
41 2057
Grizzly_125 137 38 2021
Monik_211 135 38
2014
Rzeznik_14 130 36
1997
EnginMax_287 129 36 1994
Ufim_143 128
36 1991
Holmes_050Beta 128 36 1991
Mint_23 119
33 1961
Chessterfield_i5a 112 31 1937
Aldebaran_070 103 29 1907
StAndersen_12 81 23 1834
Skaki_119c 81 23 1834
Storm_06 66 18 1784
Ozwald_043 53 15 1741
And two results on an ancient
Am5x86-P75-S/133:
Yace_09956 147 41 2054
Bestia_083 139 39 2027
Although not perfect, IQ seems to be a moderately good predictor of playing strength.
V. Debug Notes
I've been debugging the old IQ suite lately. The following 10 positions needed to be replaced. With the first nine lines, I've found alternate solutions. So the second solution just needs to be added. The 10th position had missed the "+" symbol in the solution. These changes have been incorporated into the new IQ.epd file.
2rr2k1/pb3p1p/1pq3p1/4R1N1/2n5/P4P2/BP2Q1PP/4R2K w - - bm Nxf7 Re7;
id "IQ.932";
2r1k1nr/3bbpp1/p2p2P1/4pP2/1pqNP3/2N1B3/PPP3Q1/1K1R2R1 w k - bm b3 Nd5;
id"IQ.964";
1kqr2r1/ppp5/2nb1p2/1Q1N2pb/P2P4/1RN1P3/3B1PP1/5R1K b - - bm Be2 Rh8;
id"IQ.1011";
5rk1/5pp1/3b4/1pp2qB1/4R2Q/1BPn4/1P3PPP/6K1 b - - bm Bf4 Ra8; id
"IQ.1091";
5b2/pr4pk/4P1Rp/1pppBPP1/8/1P2P3/P6K/8 w - - bm Rxh6+ gxh6; id
"IQ.1123";
2kr2r1/Qpq2p1p/1n2p3/2b2p2/8/2B2PP1/PPPN3P/R3K2R b - - bm Rxd2 Rxg3;
id"IQ.1204";
r1b2k1r/1p4pp/p4B2/2bpN3/8/q2n4/P1P2PPP/1R1QR1K1 w - - bm Bxg7+ Qh5;
id"IQ.1244";
r1n1nrk1/p4p1p/1q4pQ/2p1pN2/1pB1P1P1/5P2/PPP4P/1K1R3R w - - bm Rd6
Rhg1; id"IQ.1276";
r3r1k1/bpp1q1pp/p3bp2/2p4Q/4N3/1P2PP2/PB3P1P/R2R3K w - - bm Nxf6+ Rg1;
id"IQ.1287";
r4rk1/4ppbp/1q2bnp1/n1p4P/4P1P1/2NBBP2/PP1Q4/1K1R2NR b - - bm Bxa2+;
id"IQ.1290";
(11-14-2002)
Four additional lines are corrected. Thanks to Uri, Andreas, and GCP.
4rk2/1p3r2/3q1nQp/1R1P2p1/P2pp3/3B2PP/1P3P2/2R3K1 w - - bm Qxh6+ Rxb7;
id "IQ.967";
4rbk1/1q3ppp/2Rr4/1p1P1B2/2b1PR2/p5P1/5P1P/B1Q3K1 w - - bm Bxh7+ Rxd6
Rh4 Bxg7; id "IQ.973";
r1b1k2r/p1p1nppp/2p5/3q4/8/1P4P1/P1P1QK1P/RNB1R3 b kq - am O-O; id
"IQ.1172";
r6r/pb1R1pk1/1p2p1pp/3nP3/4N2Q/4B3/P3qPPP/3R2K1 w - - bm Bxh6+; id
"IQ.1267";
(11-15-2002)
Two more corrections. Thanks Uri.
2qrrb2/pb3ppk/1npp3p/5N2/1p2n3/1P3NPB/PBPQ1P1P/3RR1K1 w - - bm Rxe4
Qf4; id "IQ.928";
4r2r/pp3k2/2p1pq2/3nR3/2PP1pp1/1B1Q2P1/PP3PK1/6R1 b - - bm Ne3+ Nb4; id
"IQ.977";
(12-27-2002
)
Seven more postions with corrected
solutions. Thanks to Dieter Buerssner and
Yace
7r/8/3p1p1r/p1kP2p1/Pp2P1P1/1PpR3P/5R1K/8 b - - bm Re8; id "IQ.1121";
8/r4p2/6p1/1pknP2p/2p1b3/P1P2N2/2BK2PP/R7 b - - bm Nxc3 Bxc2; id
"IQ.1150";
2r5/1p6/pq2p2p/3rN3/k2P2Q1/3R2P1/1P3PP1/1K6 b - - bm Qxb2+ Qd6; id
"IQ.1157";
3rr1k1/1p2bpp1/2ppq2p/p1n5/2P4P/1PN1P1P1/PBQR1PK1/2R5 b - - bm Bxh4
Bf6; id "IQ.1158";
2r1r1k1/4qpb1/p2p2p1/1p1Pn1P1/3BB3/P1P5/1P4Q1/1K2R2R w - - bm Bxg6 Qh2
Qh3; id "IQ.1282";
4rbk1/n2n3p/b1q1p1p1/1p1pP1B1/1PrP1N2/1R3NP1/3Q1PB1/R5K1 w - - bm Nxg6
Bf1; id "IQ.1285";
r4rk1/4ppbp/1q2bnp1/n1p4P/4P1P1/2NBBP2/PP1Q4/1K1R2NR b - - bm Bxa2+
Rfb8; id "IQ.1290";
(12-27-2002) Note: IQ.1162 seems suspect. The solution of 1. Bf4 tries to seal the BQ on the kingside and then after 1... exf4, White achieves a draw due to a perpetual attack on it. But Black's 1... e4! seems to cross this plan and Black looks better.
(12/31/2002) Two suspect positions have
been removed. IQ.1162 and IQ.1167 were
broken by Miguel and Uri respectively. Thanks, guys. These positions
have been
replaced with IQ.894 and IQ.895. Total positions are still 360.
r3rnk1/4qpp1/p5np/4pQ2/Pb2N3/1B5P/1P3PP1/R1BR2K1 w - - bm Bxh6; id
"IQ.894";
r1br2k1/p1q2pp1/4p1np/2ppP2Q/2n5/2PB1N2/2P2PPP/R1B1R1K1 w - - bm Bxh6;
id "IQ.895";
If you run the test and would like to share your results please send them along and they will be included here. If you find any alternative solutions or questionable ones please send this data so the test can be improved.
My thanks to the posters at CCC and the WB forum for pointing out errors and omissions.
Enjoy the test.