Monday, July 11, 2016

A testcase preparation week

CodeChef SnackDown 2016 Final Round in Mumbai has continued the run of two-person team contests this week (problems, results, top 5 on the left). The winning team was 50% the same as last week - congratulations Gennady and Boris!

TopCoder SRM 694 followed a few hours later (problems, results, top 5 on the left, my screencast). The problems were quite standard, and jcvb and xudyh showed great mastery of flow algorithms to solve the 900 very fast and keep the battle for the first place just between themselves. In the end jcvb emerged victorious by a mere 0.09 points - congratulations!

Unusually for TopCoder there was quite some time to spare at the end of the coding phase, and thus one had the opportunity to prepare for the challenge phase thoroughly. The required mindset is quite similar to the one of the problemsetter: need to come up with testcases that would fail various potential incorrect solutions, and ideally with ones that are likely to fail even unforeseen incorrect solutions, too.

Here's the approach I've chosen this time for the easy problem (you can see it in more detail, but a bit slowly, on the screencast): let's construct random testcases that have many parts, and yet exactly one way to group the parts to achieve the maximum score. An incorrect solution in this problem is likely to be some kind of greedy instead of dynamic programming, and having exactly one solution makes sure that there are many more ways for the greedy to make mistakes. Having the testcase random instead of manually crafted ensures that it doesn't become too well-structured, as greedy solutions might actually be correct for well-structured cases.

More precisely, I've started with just three numbers which would be the xors of the groups in the final answer, and then repeatedly tried to replace any number x with (x xor y, y), where y is a random number, checked if there's still just one way to achieve the maximum, and if yes, then kept the replacement. I've generated a few cases using this approach, and picked the one with more components, and without components that looked too simple (powers of 2).

When I opened a greedy solution during the challenge phase (screencast pointer), I could then successfully use the testcase prepared in advance without crafting a case specifically against that solution. My other challenge (screencast pointer) was a bit more straightforward :)

How did you prepare for the challenge phase this time? Or maybe you remember a useful challenge phase preparation trick you've used earlier?

Codeforces held the online mirror of the 2016 Helvetic Coding Contest on Sunday (problems, results, top 5 on the left). The onsite version took place quite a while ago, and had slightly different problemset and rules. Congratulations to team Zg on earning the victory with more than an hour to spare, and on coming head and shoulders ahead of all onsite teams as well!

Thanks for reading, and see you next week!

Saturday, July 9, 2016

An under 23 week

The last week was exclusively on Codeforces. First, Codeforces Round 360 on Wednesday provided top competitors ample time both for solving all problems and for challenging the solutions of others (problems, results, top 5 on the left, analysis). The top 3 stood out because of the challenges, but TooDifficult has also executed everything in the right order - solving before spending too much time on challenging - and thus claimed the first place. Congratulations!

VK Cup 2016 Finals was the main event of the weekend (results, top 5 on the left). Unlike last year, the winning team, built from the top two teams of 2015, didn't really leave the others any chance. Congratulations on the super convincing victory, Adam and Gennady!

My previous summary included a nice TopCoder problem: you are given just one integer k between 1 and 109, and need to construct a bipartite graph with exactly k perfect matchings. Not any such graph will do: it must have at most 20 vertices in each part, and at most 120 edges. There can be more than one edge between the same two vertices, and all those edges are distinct for the purpose of counting the number of perfect matchings. Apart from those restrictions, you're free to construct the graph in any way you like.

I think there are two main approaches to this kind of problem. One is to start with a random solution, and then keep modifying it towards the required property. I don't have a good feeling whether it would've worked in this problem, although I won't be surprised if it would. The other is to learn how to build answers for some values of k, and how to combine those together, and then represent the required number as a sum or product of primitives we know how to build.

Those primitives in this problem are somewhat atypical: the values of k=3n. It's quite straightforward to construct a graph with exactly 3n perfect matchings: it will have n vertices in each part, and for each i it will have 3 parallel edges connecting the i-th vertex in the first part with the i-th vertex in the second part. We haven't yet exceeded our limit of 20 vertices in each part, as 320 is more than 109. Every other value of k can be represented in base-3 as a sum of powers of 3 (each taken 0, 1 or 2 times), but how do we achieve addition of the numbers of perfect matchings?

Well, addition in combinatorial problems usually means that there's a choice: variant a leads to x possibilities, variant b leads to y possibilities, so if we have to choose exactly one of a or b we have x+y possibilities. And matchings lend themselves well to having choices: for each vertex, we have to choose which vertex it will match. So if we want to build a graph with 3n+3m perfect matchings, we can make one of its vertices have exactly two adjacent edges, picking one of them would lead to 3possibilities of matching the rest of the graph, and the other would give 3m.

And here comes the "pull an idea out of the blue sky" moment: let's start with a graph with 19 vertices in each part and 3 parallel edges from connecting the i-th vertex in the first part with the i-th vertex in the second part, as before. Now let's add a 20-th vertex to both parts, without adding any edges initially. Then we add one edge from i-th vertex of first part to (i+1)-th vertex of the second part for all i between 1 and 19. The 20-th vertex in the first part still has no adjacent edges: this vertex will be the pivot representing addition. If we want to add 3i perfect matchings for any i between 0 and 19, we'll add an edge from the 20-th vertex in the first part to the (i+1)-th vertex in the second part. Were a perfect matching to include this edge, it's not hard to see that matching of vertices with numbers (i+1) and up is uniquely determined - we have to use the diagonal edges. The matching of vertices with numbers up to i, on the other hand, must use the horizontal edges, with 3 choices for each, and thus we have exactly 3choices in total.

We can represent any sum of powers of 3 between 30 and 319, possibly with repetitions, in this way. The base-3 representation of any k up to 320-1 will require at most 2*20=40 summands, which require 40 edges in our graph. In addition to those, we have 3*19 horizontal edges and 19 diagonal edges, so the total number of edges doesn't exceed 40+4*19=116, which is good enough. Also note that the boundary is quite tight, so other similar constructions might not work.

Thanks for reading, and check back for this week's summary!

Monday, July 4, 2016

A perfectly matching week

The June 20 - June 26 week was much calmer than the previous one. There were two "regular" rounds, the first being Codeforces Round 359 on Thursday (problems, results, top 5 on the left, analysis).

The second regular round was TopCoder SRM 693 on Saturday (problems, results, top 5 on the left, my screencast).

The medium problem was from the rising category of "constructive" problems. You are given just one integer k between 1 and 109, and need to construct a bipartite graph with exactly k perfect matchings. Not any such graph will do: it must have at most 20 vertices in each part, and at most 120 edges. There can be more than one edge between the same two vertices, and all those edges are distinct for the purpose of counting the number of perfect matchings. Apart from those restrictions, you're free to construct the graph in any way you like. Can you solve this one?

Challenge24 in Budapest was the onsite competition of the week, challenging the brave with 24 hours of problem solving (problems, results, top 5 on the left). The scores were extremely close in the final standings, and the winner was team Dandelion - congratulations!

In the last week's summary, I have once again mentioned quite a few interesting problems - let's analyze the IPSC one here. To remind, we're studying two permutations of 2n objects, called L and R. The permutation L is constructed like this: the first n objects in the old order go to odd-numbered positions in the new order without shuffling, and the last n objects in the old order go to even-numbered positions in the new order, without shuffling. The permutation R does almost the same, but the first n objects go to even-numbered positions, and last n go to odd-numbered positions. You are given three numbers: n, a and b. An object is currently on position a in the permutation of 2n objects, and we want to put it to position b using only operations L and R. Construct any shortest sequence of those two operations that achieves it.

Solving this problem involved studying the properties of operations L and R, trying to understand how they work. Such exploration involved several dead ends which I won't describe here, so it might seem that we're progressing almost miraculously :)

First, we will solve this problem from the end towards the start - in other words, let's consider the reverse operations L' and R'. How do they transform our current position x? It's not hard to see that L' moves odd-numbered positions x to x/2 and even-numbered positions x to x/2+n, and R' maps odd positions to x/2+n and even positions to x/2, where "/" stands for integer division (rounding down), and positions are numbered from 0 to 2n-1.

In other words, we repeatedly do the following: remove the last bit of our number, then choose to add or not to add n. Adding n after removing i last bits can also be viewed as adding n*2i in the very beginning. Assuming we do k steps in total, we can add n*2, n*4, ..., n*2k, and that means we can add n*m for any even number m between 0 and 2k+1-2.

After all the additions are done, we remove k last bits, and want to obtain the given number a. It means that before removing the k last bits, our number has to be between a*2k and (a+1)*2k-1.

Which means that the problem has boiled down to: does there exist such even number m between 0 and 2k+1-2 that a*2<= b+n*m <= (a+1)*2k-1?

This is of course easy to check for any given k. Moreover, we can see now that once n<=2k and b/(a+1)<2k there will always be such m, so we need to try only a logarithmic number of k values before we find the answer! Reconstructing the sequence of L and R operations knowing the values of k and m is an exercise in carefully traversing the above reasoning.

I find the way the solution flows from combinatorics to arithmetics in this problem quite beautiful. The solution explained in the official analysis skips the "from the end" part, and thus ends up being even simpler - but then it's not as satisfying to make it work :)

Thanks for reading, and check back soon for this week's summary!

Sunday, July 3, 2016

A half-plane week

The June 13 - June 19 week contained a lot of important qualification rounds for various onsite competitions. It started with Yandex.Algorithm 2016 Round 3 on Monday morning (problems requiring Yandex login, results, top 5 on the left, analysis). For the third time in a row exactly one contestant could solve all problems, and this time with 5 blind submissions - big congratulations to umnik2296!

However, most of the other results suggest that this problemset has strongly favored submitting problems in the open, in particular problem C had less than 10% submissions correct. Unfortunately I was one of those caught out by the trick in this problem, and thus couldn't qualify for the final round - good luck to all those who did!

I think problem D was the most beautiful in this set, and it was also quite tricky with just 34% success rate. You were given... nothing, but you could ask questions. In each question, you could name an integer not exceeding 1018 by absolute value, and you were told whether this number is black or white. You also knew that the structure of black/white numbers is very regular: more precisely, there exists a positive integer l not exceeding 1017 such that black/white numbers form alternating blocks of l consecutive numbers each: numbers from a to a+l-1 are white, numbers from a+l to a+2l-1 are black, numbers from a+2l to a+3l-1 are white again, and so on both in positive and negative directions. However, you don't know the values of a and l. Your goal is to determine the value of l using at most 2000 questions.

Internet Problem Solving Contest 2016 is the only non-elimination round in this summary, but it stands out in many other ways, attracting many retired contestants together with the active ones (problems, results, top 5 on the left, analysis). Big congratulations to the unnamed team from Taiwan on winning it by a 2-point margin!

Problem F in this round was an interesting exercise in researching and gradually gaining an understanding until the solution becomes clear. The research subject is two permutations of 2n objects, called L and R. The permutation L is constructed like this: the first n objects in the old order go to odd-numbered positions in the new order without shuffling, and the last n objects in the old order go to even-numbered positions in the new order, without shuffling. The permutation R does almost the same, but the first n objects go to even-numbered positions, and last n go to odd-numbered positions. You are given three numbers: n, a and b. An object is currently on position a in the permutation of 2n objects, and we want to put it to position b using only operations L and R. Construct any shortest sequence of those two operations that achieves it.

TopCoder Open 2016 Round 2C has finalized the list of 120 Round 3 participants who will compete for just 8 onsite spots (problems, results, top 5 on the left, parallel round resultsmy screencast, analysis). Congratulations to liymsheep on the convincing victory!

Here are the 120 qualified contestants grouped by country:
Russian Federation - 30: Petr, Merkurev, kuniavski, ariacas, lhic, AMashrabov, ifsmirnov, knightL, Dembel, Egor, UdH-WiNGeR, Burunduk1, Vercingetorix, Kankuro, Endagorion, Um_nik, qwerty787788, VArtem, eatmore, 2rf,Michael_Levin, -XraY-, Enot, pashka, niyaznigmatul, RiaDWaW, HellKitsune, Copymaster, LoRd_TaPaKaH, KalininN
China - 26: jcvb, SanSiroWaltz, maopao, jiry_2, liympanda, jinzhao, edly01, dnvtmf, BSBandme, lxhimo, hzt1, quailty, zhuojie, panjf1987, zyxwvu164, ftiasch, liymsheep, ACRush, Blue.Mary, Herzu, cvcvb_lyp, matthew99a, Ronnoc,xudyh, xyz111, wenhanhuang
Japan - 17: snuke, semiexp, sugim48, rng_58, camypaper, sky58, yosupo, uwi, rickytheta, j_gui0121, chokudai, LayCurse, tozangezan, logicmachine, anta, hogloid, natsugiri
Poland - 12: embe, marek.cygan, dasko, Marcin_smu, fruwajacybyk, tom612pl, Errichto, Swistakk, mnbvmar, Stonefeang, krismaz, dj3500
Ukraine - 8: K.A.D.R, sdya, mgch, Vasyl[alphacom], LeBron, dzhulgakov, MrDindows, Milanin
Taiwan - 5: peter50216, eddy1021, ShikXD, aaaaajack, dreamoon
Belarus - 4: tourist, Arterm, subscriber, Ra16bit
South Korea - 4: ainu7, ainta, jaydoubleuel, Kriii
Australia - 2: izrak, John Dethridge
United States - 2: scott_wu, ksun48
Canada - 2: FatalEagle, azneye
Netherlands - 1: krijgertje
Brazil - 1: ffao
Switzerland - 1: W4yneb0t
Germany - 1: pwahs
Croatia - 1: ikatanic
South Africa - 1: bmerry
Indonesia - 1: azaky
Viet Nam - 1: skyvn97

The solution for the easy problem relied on a cute geometric observation. The problem went like this: you are given 1500 points on the plane. You can move from any point to any other point if there are no more points on the segment connecting them. For all pair of points you need to determine what's the smallest number of moves needed to get from one to the other - and of course to do it faster than the standard O(n3) all-pairs-shortest-paths algorithms.

Russian Code Cup 2016 Elimination Round has chosen the 50 contestants for the final round, which might or might not happen onsite (problems, results, top 5 on the left, my screencast, analysis).

The hardest problem F forced one to pick a good tradeoff between studying too many cases on paper and implementing more logic in the solution. You were given at most 100000 integers ai, each between -C and C, and a goal d. You needed to find such integers xi, also between -C and C, that sum of ai*xi is equal to d. Additionally, none of xmust be equal to zero.

It's not hard to see that without the "between -C and C, and non-zero" restriction a solution exists if and only if the greatest common divisor of all ai divides d. It turns out that for sufficiently large values of C nothing changes. This problem in particular had C always equal to 1000000, so solving it was a matter of careful reasoning and implementation.

What is the largest value of C for which the condition mentioned above is not sufficient?

Finally, CodeChef SnackDown 2016 Elimination Round selected the 25 two-person teams qualifying for the onsite round in Mumbai (problems, results, top 5 on the left). Congratulations to the team deepdark on the victory, and to all those who qualified!

I've mentioned a few nice problems in the previous week's summary. One solution was explained in a separate post, so let me cover another one here. The problem was: you are given a convex polygon with 100000 sides. For each point strictly within the polygon we can define its asymmetry value: the maximum ratio of the two segments between this point and the boundary of the polygon along any line. For example, if that point is the center of symmetry of the polygon, its asymmetry value is 1. What is the smallest asymmetry value over all points inside the given polygon?

The first step is quite usual: instead of finding the point with the smallest asymmetry value directly, we'll learn to check if there are points with asymmetry value below the given value, relying on binary search to then produce the final answer.

Then we notice that when determining the asymmetry value of a given point, we can consider only the lines passing through a vertex of the polygon, and where the longer of the two resulting segments is between the point and the vertex (assuming it's not, we can show that turning the line in one of the two directions will not decrease the ratio).

Now let's look at our picture from the point of view of one of the vertices A of the polygon. Which points O inside the polygon have the property that AO/OY<=m, where Y is the other point of intersection of line AO with the polygon, and m is the asymmetry value limit we're testing? It's not hard to see that those are exactly the points of a smaller polygon that is obtained by compressing our polygon m/(m+1) times towards A (see the picture on the right).

That observation reduces our problem to the following question: consider the n smaller polygons obtained by compressing our polygon m/(m+1) times towards each of its n vertices. Do they intersect?

An intersection of a few convex polygons can also be thought of as an intersection of all half-planes containing their sides. However, we have n n-sided polygons to intersect here, so we have n2 half-planes, and that's too much.

However, all those n2 half-planes were obtained by shifting one of the n sides towards one of the n vertices. As we see on the picture on the left, instead of looking at all n images of one side, it's enough to consider the one where it's shifted the furthest - in other words, towards the vertex that's most distant from the line containing given side! We can find the most distant vertex for each side once using the rotating calipers algorithm, and then we have only n half-planes to intersect each time.

And that means we have reduced the original problem to a standard one: check if n half-planes have a non-empty intersection in O(nlogn) or faster. This problem has a tedious standard solution and a beautiful randomized solution. Let me describe the latter.

The randomized algorithm inductively builds the answer to the following question: what is the point (x, y) that lies in the intersection of the given half-planes and maximizes the value of ax+by, where a and b are some arbitrary constants, for example the normal vector to the first half-plane's boundary (such pick will make sure that the value ax+by in that half-plane is bounded from above)?

Assume we already know such point (x, y) for some set of half-planes and want to add one more half-plane. There are two possibilities: either the point (x, y) belongs to the new half-plane, or not. In the former case, nothing needs to be done - the old maximum is also the maximum now.

In the latter case, it's easy to see that the new point (xy) must lie on the boundary of the new half-plane. Knowing that, we can find it by intersecting all previous half-planes with that boundary, and then solving the 1D half-line intersection problem by finding the bottom-most downwards and top-most upwards half-lines.

That's the entire algorithm - we will add all half-planes in random order, and either find a point (xy) in their intersection or learn that there's no such point.

The interesting part is, of course, its running time. In the worst case, we'll have to solve the 1D problem every time, for the total running time of 1+2+...+n=O(n2). However, since we consider the half-planes in random order, it's unlikely that we will always run into the worse of two cases. More precisely, consider the i-th step of the algorithm. The point (xy) after this step lies on the intersection of the boundaries of some two of the first i half-planes. This point was also the answer on the previous step i-1 unless the i-th half-plane is one of those two. That means that the probability that we need to solve the 1D problem on the i-th step is at most 2/i. Solving the i-th 1D problem takes O(i), and thus the expected total running time is O(1)/1+O(2)/2+O(3)/3+...+O(n)/n=O(1)+O(1)+...+O(1)=O(n) - linear!

This is the first time I describe this algorithm and I haven't yet implemented it myself, so please correct me if there are any errors or ways to implement it even simpler!

And of course, thanks for reading, and check back soon for more.

Saturday, July 2, 2016

Random problems

As discussed in the previous post, my latest Code Jam problem pointed at approximate algorithms right from the statement: you are given a rooted forest with at most 100 vertices, and consider all valid permutations of its vertices. A permutation of vertices is valid if every non-root vertex comes after its parent. Additionally, each vertex is assigned a letter, so each permutation induces a string when we write all those letters in the order of the permutation. Which fraction of valid permutations induce a string that contains the given substring? Your answer doesn't need to be very precise: an error of 0.03 is allowed.

We need to estimate a fraction that's between 0.0 and 1.0 with an error of 0.03 allowed - that doesn't sound very precise. In fact, that's just 17 times more precise than simply always answering 0.5 :) That allows us to use the Monte-Carlo approach: pick k independently and uniformly sampled valid permutations, find out the number m of them that contain the given substring, and output m/k as the answer.

There are two things still unclear about this solution: how large does k need to be to make the error small enough, and how to uniformly sample valid permutations.

k independent samples from a Bernoulli distribution is called a binomial distribution, and we can use its cumulative distribution function to estimate the required k. However, there's a simpler approach that works for virtually any distribution: the Central Limit Theorem. It says that for moderately large values of k the mean of k random values sampled independently from the same distributions is almost like a normal distribution with mean equal to the mean of each random value, and standard deviation equal to the standard deviation of each random value divided by the square root of k.

Each random value in our case is equal to 1 with some unknown probability p, and to 0 with probability 1-p. Its mean is thus p, and its standard deviation is sqrt(p*(1-p)), which is at most sqrt(0.5*0.5)=0.5. The standard deviation of the mean of k such values is thus 0.5/sqrt(k), and by the theorem it's distributed almost normally. For the normal distribution we know the probability that it will be off its mean by the given number of standard deviations. For example, it's within 6 standard deviations of its mean with probability roughly 1 in 500 million. 6 standard deviations is at most 3/sqrt(k), and thus picking k=10000 makes 6 standard deviations lie within the allowed 0.03 error.

Of course, the Central Limit Theorem can't always be blindly applied - one must understand what's going on. For example, note that the error probability of 1 in 500 million is for one number. Since the problem has actually asked you to compute 500 such numbers, the probability of going outside the allowed 0.03 error in at least one of them could only be bounded by 1 in a million, which is still a pretty remote chance - but one has to keep this point in mind when using the theorem. For example, picking k=1000 in our problem means that 0.03 would correspond to roughly 2 standard deviations, so we'd give an answer within allowed error with probability at least 95%, which sounds good enough to submit. However, since we need to compute 500 numbers and not just one, the probability that all of them will stay within 2 standard deviations is much, much lower: 0.95500 is about 7*10-12, so one is very likely to fail when using k=1000.

Another issue is that "moderately large" and "almost" terms are of course not formal. The real theorem gives those terms a precise meaning, and it turns out that in our case they work as expected if p and 1-p are both significantly greater than 1/k. To see the opposite situation, consider a value of p that's less than 1/k. It's not hard to see that the mean of k values will now be almost always 0, sometimes 1/k, and much more rarely anything else - clearly not very close to being distributed normally. This issue doesn't hurt in our problem since for the values of p that are so close to 0 and 1 we'll clearly give a good answer.

The most difficult part of this problem was not the Monte-Carlo idea - it was figuring out how to sample valid permutations uniformly. I won't go into details on this part, though, as it's explained very well in the contest analysis.

What I wanted to bring up instead is the philosophical question of how should problems with approximate solutions be properly stated and checked. As an example, the solution described above can give a wrong answer once in a million attempts. Is that good enough? Would it be unfair if some contestant was so unlucky that he got a wrong answer that way and had to submit again? What if this problem allowed only one submission attempt, and thus getting a wrong answer would have a much steeper penalty - which probability of mistake is acceptable then?

This problem had another peculiar property: the reference solution used Monte-Carlo approach as well, so we could only be sure that the reference answers are correct up to some probability. In other words, one could say that with some very remote probability submitting an exact answer could give a wrong answer verdict! We have made sure that chance is really remote by using MapReduce before the contest to run a lot more attempts for each testcase, and thus we were extremely confident that our answers are precise. What do you think is an acceptable probability of mistake in this case?

Finally, some of my past Code Jam problems (Proper Shuffle, Good Luck) went even further: the inputs themselves were randomly generated, and one had to estimate the hidden variables used for this random generation. In that case the input file by itself does not completely formally determine which output files are correct and which are not! With some probability many different sets of hidden variables could be used to generate the input, and while the actual variables used for generation are known to the judges and will be compared against when checking your solution, you don't have a way to be 100% certain your output is correct even given infinite computational resources. You can, of course, be 99,9999999999999999% certain. How many nines is acceptable here?

I'm looking forward to hearing your opinions on this subject! I'll share my own boundaries in a following post.

An asymmetric week

The June 6 - June 12 week started with Codeforces Round 356 on Wednesday evening (problems, results, top 5 on the left, my screencast, analysis). It turned out to be impossible to solve all five problems in time, and thus the winning strategy involved skipping one of the easier problems to spare time for the hardest one.

Here is the problem I skipped: you are given a 500x500 grid where each cell is either black or white. You're given a number k, and are allowed to paint any kxk subgrid black (but only do that once). What's the size of the largest 4-connected black area you can get?

TopCoder SRM 692 took place very early on Friday (problems, results, top 5 on the left). In line with the current TopCoder tradition, just two people have managed to solve all three problems - congratulations Kriii and rng_58! Amusingly, all 3 top finishers were in one room, setting the stage for some challenge phase racing, but it didn't happen as the gaps were probably too big.

Yandex.Algorithm 2016 Round 2 on Friday was the second chance to score points towards qualifying for the top 25, which for most practical purposes is achieved by placing in the top 8 in one of the three rounds (problems requiring Yandex login, results, top 5 on the left, my screencast, analysis). Just like last week, the winning strategy turned out to be submitting everything in the open and racing to solve all problems, although one could place in the top 8 with any strategy. Congratulations to apiad on the victory!

The hardest problem in this round was quite beautiful geometry: you are given a convex polygon with 100000 sides. For each point strictly within the polygon we can define its asymmetry value: the maximum ratio of the two segments between this point and the boundary of the polygon along any line. For example, if that point is the center of symmetry of the polygon, its asymmetry value is 1. What is the smallest asymmetry value over all points inside the given polygon?

On Saturday Google Code Jam 2016 Online Round 3 started another big Code Jam weekend (problems, results, top 5 on the left, analysis). The competition was tough as only the top 25 would qualify for the onside round in New York. The right strategy turned out to involve skipping the large input of problem C, which was a tricky graph problem, and solving the purely mathematical problem D instead. Congratulations to xyz111 for the convincing victory that put all strategy considerations aside!

I was the author of problem B in this round, which went like this: you are given a rooted forest with at most 100 vertices, and consider all valid permutations of its vertices. A permutation of vertices is valid if every non-root vertex comes after its parent. Additionally, each vertex is assigned a letter, so each permutation induces a string when we write all those letters in the order of the permutation. Which fraction of valid permutations induce a string that contains the given substring? Your answer doesn't need to be very precise: an error of 0.03 is allowed.

Finally, Google Distributed Code Jam 2016 Online Round 2 wrapped up this busy week on Sunday (problems, results, top 5 on the left, analysis). The results were once again much more optimistic than last year's, suggesting that the contestants have mastered the new format. Congratulations to eatmore on the victory, and to everybody who qualified for the onsite round - looking forward to seeing you all in New York!

In the last week's summary I've described a hard TopCoder problem: you're given a 50x50 integer matrix with all values between 1 and 61. We pick two (possibly intersecting or even coinciding) submatrices uniformly at random. We find X - the least common multiple of all numbers in those two submatrices combined. Now we find the number Y - the smallest possible integer that does not divide X. What's the expected value of Y?

Since the matrix is just 50x50, we can solve the problem quite easily if only one submatrix is considered: we can just iterate over all possible submatrices and find the value of X and then Y for each of them. We can also notice that the answer is always a power of a prime, and can't exceed 64 (since 64 does not divide the least common multiple of all numbers between 1 and 61).

How do we go from one submatrix to two submatrices? This is where some advanced algorithms come into play. First, instead of finding the Y for each submatrix, let's find the number of submatrices which have each value of X. The number of possible lcms of some subset of values between 1 and 61 isn't very big. Using that information we can "sum over all divisors": find the number of submatrices with the value of X dividing each interesting value. Having done that, making a jump from one submatrix to two submatrices is super easy: the number of ways to pick two submatrices such that their values of X both divide the given value is just the square of such number for one submatrix. And finally, we need to undo the sum over all divisors step: knowing the number of pairs of submatrices with lcm that divides the given value, we compute the number of pairs with lcm that's equal to the given value. Now we just need to find the value of Y corresponding to each possible value of X.

How do we transition to sum over all divisors and back? This is done via dynamic programming, similar to the approaches described in my April post and in this recent Codeforces discussion. The tricky part here is that we don't operate on bitmasks, but rather on sequences of prime powers, which have more than two possible values for smaller primes. But the logic stays exactly the same: iterate over the prime first, and then over the number.

Thanks for reading, and check back soon for more!