Wednesday, May 18, 2016

A 400-500 week

Let's take a quick break from the World Finals practice session and take a look at last week's events. Codeforces Round 352 on Wednesday was one of the last chances to practice before this week's event (problemsresults, top 5 on the left, analysis). Quite fittingly, three out of top five spots were taken by the World Finals competitors, including yet another first place for subscriber – he is on fire!

TopCoder Open 2016 Round 2A took place a day later (problems, results, top 5 on the left, my screencast). The point values of 400, 500 and 1000 (instead of the usual 250, 500, 1000) gave a hint that this will not be a usual round, and it did go strangely indeed (self-inducing prophecy?..) The 1000 was probably the most "typical" problem of the round, so going for it after the 400 has paid off for me. I've also managed to get the 500 submitted and even accepted, but there actually exists a test that makes my solution time out. Can you come up with one? Here's the solution (requires TopCoder login). Bonus points if your test makes it time out even after shuffling the vertices – I don't know how to do that :)

Here's the problem statement of the hard problem that saved my day: you have 40000 tokens which you can spend using 15 vending machines. Each machine takes tokens, and sometimes gives a prize in response (but sometimes not). The probability of getting a prize after inserting K tokens and not getting a prize on the first K-1 attempts is min(1, K2/Li2), where Li is a constant, different for different vending machines. After you get a prize from a machine, its state resets to the initial one (so K in the above formula becomes the number of tokens inserted after that). Your strategy must be to pick some vending machine, keep inserting tokens to it until you get a prize, then pick some different vending machine, give tokens until a prize there, then pick some different vending machine again (this one might be the same as some previous machine used, but not as the vending machine used to get the last prize), and so on until you run out of tokens. The prizes from i-th vending machines have value Vi. How to maximize the total expected value of all prizes you get?

Thanks for reading, and keep checking my twitter for ICPC 2016 updates!

Tuesday, May 17, 2016

A Polish week

TopCoder SRM 690 took place in the very early hours of Wednesday, May 4th (problems, results, top 5 on the left). Snuke has managed to score 50 additional points during the challenge phase, and that’s what allowed him to jump from third to first place – congratulations on the first SRM victory!

VK Cup 2016 Round 3 on Saturday selected 20 lucky teams to advance to the onsite finals in St Petersburg (problems, results, top 5 on the left, online mirror results, analysis). The rating favorite team “Never Lucky” bounced back from the relatively weak 4th place showing in Round 2 to win this round by over 1000 points – great job! Nevertheless, there are many other strong teams in the top 20 who will make sure the St Petersburg final round isn’t easy for subscriber and tourist.

Google Code Jam Round 1C selected the final 1000 participants for Round 2 (problems, results, top 5 on the left, analysis). Among the top scorers were the reigning champion Gennady.Korotkevich and linguo who doesn’t participate in competitions other than Google Code Jam but nevertheless does very well each year – but they couldn’t take the first place from artberryx – congratulations on the win!

Finally, Russian Code Cup 2016 Qualification Round 1 wrapped up the tournament-heavy weekend late on Sunday (problems, results, top 5 on the left, analysis). Subscriber pulled out another victory just a few days before the ACM ICPC World Finals, suggesting that the St Petersburg ITMO team is still very well in the running for the top spots there – way to go!

The last problem E required one to combine two relatively standard but normally unrelated algorithms into one solution: given two trees with at most 50 vertices in each one, find the size of the largest connected subtree in the first tree that has an isomorphic connected subtree in the second tree.

In my last summary, I've mentioned a problem that I couldn't solve: you are given two strings of the same length consisting of lowercase English letters, and want to transform the first string into the second one. You are allowed operations of two kinds: changing one character into another takes 1 second, and changing all occurrences of some letter into some other letter (the same for all occurrences) takes C seconds. What is the fastest way to do the required transformation?

Tomek Jarosik has posted a link to analysis of the original contest of this task in comments. Here's my rough understanding.

First of all, it's not hard to see that we can do all group changes before all individual changes – if we're going to waste 1 second for a given character, we can do it in the end and change it directly to what we need. Because of this, the task can be reformulated: now we're only allowed to do group changes, each costing C, and need to minimize total cost of group changes plus the total number of positions in which the resulting string differs from the goal.

Now let's draw a graph where the vertices are the 26 letters. The edges, somewhat counter-intuitively, will not be the group changes we make – instead, let's draw an edge from each letter to the letter that all its occurrences will change to after all group changes. For example, if we change all As to Bs, then all Cs to As, and then all Bs to Cs, then our graph will have edges A->C, B->C, C->A. Each letter will have 0 or 1 outgoing edge.

Given such graph, it's easy to count the total number of positions in which the resulting string differs from the goal: since we know how each letter ends up, we know the resulting string, and can simply count the differences. The other part of the value we minimize is not so clear: what is the minimum number of group changes required to construct the given graph?

The lower bound on the number of group changes is the total number of edges in this graph. Moreover, in most cases this lower bound is actually the answer. To see that, let's consider the structure of our graph. Since each vertex has 0 our 1 outgoing edge, the connected component of this graph are either directed trees, or a cycle with a directed tree growing into some of its nodes.

In order to find the group operations that construct the given directed tree, we can start from the root (the "sink"), apply the operations for all letters that need to be changed into the root letter, then continue with the letters that need to be changed to those letters, and so on.

When we have a cycle, we can't do the same. In fact, when a connected component of the graph forms a cycle of length K, we can't really implement the changes using just K group operations – we need K+1 in that case. To see that, notice that the first operation we make can not transform one of the letters in the cycle into another, as that would "merge" two letters together that need to be separate in the end. Instead, we must use an auxiliary letter that's not in the cycle. For example, if we need to build A->B->C->D->A, then we need to do something like: A to Z, then D to A, then C to D, then B to C, then Z to B.

However, when a connected component is not just a cycle – in other words has one or more directed trees going into the cycle's vertices – then we don't need an extra group operation for this cycle. For example, suppose the above A->B->C->D->A cycle also had a E->C incoming edge. Then we can do: B to E, then A to B, then D to A, then C to D, then E to C, handling both the cycle of length 4 and an extra incoming edge in 5 total group operations, and achieving the lower bound. If there's more than one edge going into the cycle, we can handle the rest as separate directed trees after handling the cycle.

There's one more constraint that we need to take care of: the standalone cycle resolution mentioned above needs an auxiliary letter (denoted as Z in the example above). In case there's a letter with an outgoing edge but without incoming edges, it can perform the role of that auxiliary letter: first, handle the component containing it, and then use it as auxiliary letter for all standalone cycles. If, however, there's no such letter – in other words, if the entire graph consists of standalone cycles and isolated vertices, then it's not clear what to do. After some more thinking we can notice in case such graph has at least one standalone cycle (in other words, at least one edge), then it's impossible to construct it at all, since it implements a non-identity permutation on the letters, and our very first group operation will merge two letters together.

Let's summarize what we have reduced our problem to. We need to build a graph on the 26 letters as vertices, with 0 or 1 outgoing edge for every vertex. If we have an edge from letter A to letter B, then its cost is the number of mismatches that yields (number of positions where the first string has A and the second string has not B) plus C (for the group operation corresponding to this edge). If we decide not to have an edge from letter A, this also has a cost (number of positions where the first string has A and the second string has not A). In addition, we get an extra cost of C for any standalone cycle in our graph, and the graph must not be all standalone cycles and isolated vertices, but it can be all isolated vertices. We need to minimize the total cost.

Our team got this far at the actual contest, but we couldn't come up with a way to solve this reduced problem. It turns out that solving it relies one one key insight: first, let's build this graph greedily: for each letter, pick the outgoing edge (or lack thereof) that minimizes the contribution to the total cost (taking account the cost C of having the edge, but not the additional bonus for a standalone cycle or the not-all-standalone-cycles restriction). This will give some graph G0. If this graph does not have standalone cycles, then we're done. And in case it does, it turns out that those are the only cycles that we need to take care of!

More precisely, we can relax the problem as follows without changing the answer: instead of getting an extra cost of C for any standalone cycle, we will only get an extra cost of C for any standalone cycle that is present in G0. Similarly, instead of forbidding the graph to consist of only standalone cycles and isolated vertices, we will only forbid the graph to be identical to G0, and only if the latter consists of only standalone cycles and isolated vertices.

To see why the answer doesn't change, consider some optimal solution for the relaxed problem. Suppose it has some unexpected standalone cycle that is not present in G0. At least one vertex on this cycle has a different outgoing edge in G0. Let's change that edge to the one from G0. The total cost will not increase since G0 is locally optimal, and at the same time the standalone cycle will be destroyed, and no new standalone cycles will form (since it's impossible to do that with just one edge change). By continuing this process, we can show that there exists an optimal solution for the relaxed problem that is also a valid solution for the original problem.

Finally, we can solve the relaxed problem using dynamic programming: we process all letters in order, and pick the outgoing edge (or lack thereof) for each of them, while remembering which cycles from G0 we have already broken, and whether our graph is still identical to G0. Since Ghas at most N/2 cycles, the running time is O(2N/2*N2) which is very small for N=26.

Thanks for reading, and check back soon for the last week's summary. Also make sure to check my twitter for the ACM ICPC 2016 World Finals updates!

Thursday, May 5, 2016

A 26x26 week

First of all, let's come back to the Grand Prix of Southern Caucasus which rounded up the 2015-16 season of the Open Cup on the week before the last one but had results published only last week (results, top 5 on the left). Team Havka-papstvo ended the season on the highest possible note, solving all problems except the most difficult one, and doing it almost without bugs — great job!

I still have no idea how to solve the aforementioned most difficult problem B — maybe the readers of this blog can help? The problem statement is extremely simple: you are given two strings of the same length consisting of lowercase English letters, and want to transform the first string into the second one. You are allowed operations of two kinds: changing one character into another takes 1 second, and changing all occurrences of some letter into some other letter (the same for all occurrences) takes C seconds. What is the fastest way to do the required transformation?

The strings have at most one million characters, but that's not really important since what matters is how many times each of 26x26 (old letter, new letter) pairs appears.

Since this was the last round of the Open Cup season, the final standings of the cup are now finalized (results, top 5 on the left). Unlike last year, most (7 out of 10) top finishers are coming to the ACM ICPC World Finals in Phuket, which looks to be very exciting!

TopCoder Open 2016 Round 1C on Wednesday presented the last opportunity to qualify for Round 2 and stay in contention for the finals in Washington, DC (problems, results, top 5 on the left). Four top competitors on the scoreboard have solved all three problems, but krijgertje was much faster in doing so — congratulations!

Codeforces Round 349 offered an opportunity to start the weekend competitively on Friday night (problems, results, top 5 on the left, analysis). Problem D was a nice exercise about implementing a problem with many possible cases in such a way that (almost) no case analysis is present in the solution. It went like this: you are given four points on the plane. You can shift each point either horizontally or vertically, but not in both directions. Different points can be moved in different directions — for example, three points can be moved vertically and the fourth one horizontally. Your goal is to get the points to become the corners of an axis-parallel square, while minimizing the maximum distance traveled by one of the points.

Finally, Google Code Jam 2016 Round 1B on Saturday allowed those who failed a problem in Round 1A a second chance (problems, results, top 5 on the left, analysis). ikatanic shared the first place with rng..58, but he has also appeared in the top 5 in another contest this week, so he wins by tie-breaking :) Congratulations to both!

In my previous summary, I've mentioned the 0.636 problem: you are given 2n points on the plane, no three points on the same line, and a matching between them – n segments connecting the points to each other in such a way that each point is the endpoint of exactly one segment. Your goal is to find another matching with an additional property: the segments must not intersect. That would be a classical problem, but here there’s an extra constraint that changes the problem completely: the total length of the segments in your matching must be at least 0.636 (zero point six three six) times the total length of the segments in the given matching. n is at most 100.

The first step in solving this problem is figuring out where does the strange number 0.636 come from. It's funny that just googling that constant yields the answer: this number is slightly less than 2/π, and that, in turn, is the expected length of a projection of a segment of length 1 onto a random line.

This leads us to the first step of the solution: if we pick a random line, and project our matching onto that line, then the expected total length of the projection will be 2/π times the total length of the original matching. Because of this, we can try several random lines and quickly find a line L with projection that's at least 0.636 times the total length of the original matching.

How does this help? Our goal now will be to find a non-intersecting matching with total length greater than or equal to the total length of this projection. Since the total length of a matching is greater than or equal to the total length of its projection, it is enough to find a non-intersecting matching with a projection to the same line L that it greater than or equal in length to the projection of the original matching. And for that, in turn, it is enough to find a non-intersecting matching with a projection to the same line L that has the maximum possible total length.

Which matchings between 2n points on a line have maximum possible total length? This is a reasonably easy question: it's those matchings where the left end of each segment is among the n leftmost points on the line, and the right end of each segment is among the n rightmost points on the line. Moreover, any such matching has this largest possible total length!

Which reduces our problem to the following: we have 2n points on the plane, n of them colored blue (the ones with n leftmost projections on L) and the remaining n colored red, and need to find any non-intersecting matching connecting those points, which is a well-known task with many possible solutions. One solution starts with any matching and then "flipping" intersecting segments until there are none (this will end in finite time since the total length constantly decreases); another takes advantage of the fact that the groups of n blue and n red points are separated from each other in our case, and thus we can find a common tangent to their convex hulls, pair the corresponding red and blue points, and repeat.

Thanks for reading, and check back next week!

Monday, April 25, 2016

0.636 of a week

TopCoder SRM 689 on Saturday presented two very creative problems (problems, results, top 5 on the left). The medium problem required mostly mathematical thinking, while the hard problem involved both mathematics and algorithms. Scott Wu has figured out both, and didn’t forget to add 150 challenge points to stay in the first place – way to go!

The more algorithmic one went like this: you are given 2n points on the plane, no three points on the same line, and a matching between them – n segments connecting the points to each other in such a way that each point is the endpoint of exactly one segment. Your goal is to find another matching with an additional property: the segments must not intersect. That would be a classical problem, but here there’s an extra constraint that changes the problem completely: the total length of the segments in your matching must be at least 0.636 (zero point six three six) times the total length of the segments in the given matching. n is at most 100.

This problem is yet another example of the brilliant problems that the TopCoder admins and problem authors manage to come up with time and time again. Huge kudos to cgy4ever and all the TopCoder problem authors!

VK Cup 2016 Round 2 took place a day later (problems, results, top 5 on the left, parallel round without first problem results) . Team SPb SU Base have continued their super impressive recent form, both in personal and in team competitions – congratulations to -XraY- and ershov.stanislav!

Last week I’ve described a nice problem: you are given a 20x100000 table of zeroes and ones. You are allowed to invert (replace 0s with 1s and vice versa) rows and columns of this table. What is the smallest number of ones you can make the table contain?

The table is hugely asymmetrical: there are a lot of columns but just 20 rows – and we definitely need to exploit that. Since the order of inversions doesn’t change the result, let’s say we start by inverting the columns, and then handle the rows. Consider a column. Let’s say the ones in it form a binary number x. Let’s also say that the ones corresponding to the rows we’re going to invert form another binary number y. Then the ultimate number of ones in this column is the number of ones in x^y (here ^ stands for bitwise exclusive-or), let’s denote it ones(x^y). Note that we can also invert the column before inverting the rows, in which case we will get ones((~x)^y), which is actually equal to n-ones(x^y), where n is the number of rows, and ~ is the bitwise negation of a n-bit number. Our goal is to pick the y that minimizes the sum of min(ones(x^y), n-ones(x^y)) over all columns.

Note that the binary numbers are used here purely for the purposes of an efficient and easy-to-code implementation – fundamentally we’re just operating with sets of rows.

The key insight is: let’s use dynamic programming to count for each y and each number k from 0 to n the number of x’s such that ones(x^y)=k. Let’s call it a(y, k). It’s easy to see that knowing these numbers allows counting the above sum easily, and thus picking the value of y that minimizes it.

For k=0, this is easy – we just need to count the number of times each bit pattern appears in a column (we can just iterate over all 100000 columns and increment the corresponding value in an array). For k=1, it’s also somewhat easy: a(y, 1) is equal to the sum over all y’ that differ from y in exactly one bit of a(y’, 0). Continuing this idea to larger values of k is not as easy, though: we will count each way of flipping two bits twice, depending on the order in which we flip the bits, and even more worryingly we might flip the same bit twice, thus declaring that a number differs from itself in two bits. One can probably account for this via some clever combinatorics, but there’s a better way.

It’s actually quite logical: since we don’t want to overcount different orders of flipping the bits, let’s always flip the bits in a pre-determined order. In other words, we add another dimension to our dynamic programming: b(y, k, t) is the number of x’s (remember, those are the bit patterns of columns of the input matrix) such that ones(x^y)=k, with an additional restriction that x and y may only differ in the first t bits. Now we can easily see that b(y, k, t+1) = b(y, k, t) + b(y ^ (1 << t), k - 1, t): we either flip the t-th bit, or we don’t. And the number a(y, k) that we need is simply found as b(y, k, n). This solution needs O(2nn2) very simple operations, which fits in time for n=20.

As shown on the picture on the left, this dynamic programming is similar in spirit to the discrete Fourier transformation. I’m not sure if that similarity has something meaningful behind it.

There was also an Open Cup round this Sunday, sharing the problems with an international (ex-USSR) onsite competition called Vekua Cup, but its results are still not available, so I will cover it next week.

I was preparing this post on a plane, so instead of the usual nature photo I only have this :) Thanks for reading, and check back next week!

Sunday, April 17, 2016

A greedy week

This week started with TopCoder Open 2016 Round 1B on Tuesday (problems, results, top 5 on the left). Just like in Round 1A, the top of the scoreboard featured many contestants who don't compete much anymore — but the first place went to Kankuro who's actually quite active in other competitions, just doesn't like doing TopCoder SRMs for some reason. Congratulations, Vlad!

TopCoder SRM 688 followed on Friday (problems, results, top 5 on the left, my screencast). The main story of this round was that of the 1000-pointer, which went like this: you are given a sequence of opening and closing parentheses. You want to color each parentheses either red or blue, in such a way that all red parentheses form a correct parentheses sequence, and all blue parentheses form a correct parentheses sequence. Out of all ways to do that, you want to pick the one that minimizes the total cost of the two resulting parentheses sequences. The cost of a parentheses sequence, in turn, is defined as the sum of costs of all its matching parentheses pairs. And finally, the cost of a matching parentheses pair in a parentheses sequence is defined as the square of the maximum level of nesting of parentheses present inside that pair (i.e. the cost of "()" is 1, the cost of outer parentheses pair in "(()())" is 4, and the total cost of "(()())" is 6).

Almost all accepted solutions for this problem, to the best of my knowledge, implement the following greedy algorithm: go from left to right, and color each parentheses (either red or blue) in such a way that the balance of the red sequence is always greater than or equal to the balance of the blue sequence, and they differ by at most 1. It's not hard to see that this rule determines the color of each parenthesis uniquely.

For example, consider input string "((())()(()))". Here's how the greedy solution works on this input:
Char  Color  Red String  Red Balance  Blue String  Blue Balance
---------------------------------------------------------------
   (    red           (            1                          0
   (   blue           (            1            (             1
   (    red          ((            2            (             1
   )    red         (()            1            (             1
   )   blue         (()            1           ()             0
   (   blue         (()            1          ()(             1
   )   blue         (()            1         ()()             0
   (   blue         (()            1        ()()(             1
   (    red        (()(            2        ()()(             1
   )    red       (()()            1        ()()(             1
   )   blue       (()()            1       ()()()             0
   )    red      (()())            0       ()()()             0

Intuitively it seems like this algorithm splits the parentheses most evenly into two halves, and thus the cost of each parenthesis pair will be as small as possible, and thus the total cost will also be as small as possible. A formal proof, however, evades me. The problemsetters also didn't expect this greedy solution, and expected something much more complex. Do you see how to prove (or find a counterexample) to this solution?

Some further points:
  • This solution passes stress test using random sequences of length up to 16.
  • It seems that this solution also works with any other non-decreasing cost function (from nesting level to cost).
  • Minor details are important: for example, if you don't always maintain that it's the red sequence that has higher balance when the total balance is odd, and allow the blue sequence to have higher balance sometimes, then the solution isn't optimal anymore. For example, a somewhat logical implementation is to always put the next parenthesis to the red sequence if the current balances are equal, regardless of it being closing or opening. It fails, for example, the following testcase: "((())(()))". It splits it as "(())(())" + "()" for the total cost of 4+4+1+1+1=11, while "(()())" + "()()" is better at 4+1+1+1+1=8.
At the same time with the SRM, CROC Championship 2016 Final Round took place in Moscow (problems, results, top 5 on the left, online round results with 4/5 problems in common, partial analysis in Russian). The dynamic scoring system on an onsite competition with just 50 contestants meant that just the two most difficult problems really mattered. Gennady was able to claim the victory by solving one of those problems well before anybody else could — congratulations!

Problem C was the most interesting for me in this round, in particular because it had a very simple and beautiful statement: you are given a 20x100000 table of zeroes and ones. You are allowed to invert (replace 0s with 1s and vice versa) rows and columns of this table. What is the smallest number of ones you can make the table contain?

Google Code Jam 2016 continued with the Round 1A on Saturday (problems, results, top 5 on the left, analysis). nika has managed to solve all three problems in just 21 minutes — amazing! Congratulations to everybody who advanced to Round 2, and those who don't have two more chances in the coming rounds 1B and 1C.

Thanks for reading, and check back next week!

Monday, April 11, 2016

A binary week

Last week was reasonably calm, TopCoder SRM 687 being its highlight (problems, resultsproblem discussion). Once again, tourist didn't give others many chances, but matthew99a has managed to shine in a slightly different way — by getting into the top 5 for the third SRM in a row. Congratulations to both!

Now I want to come back to the problem I mentioned in my previous summary. However, since I've published it just a few minutes earlier, I'd like to mention once again that the problem is super interesting and that I encourage you to try solving it yourself before reading the solution!

The problem was to interactively guess a hidden number between 1 and n (which was up to 109) spending at most log2(n+1)+0.1 steps on average (notice the lack of ceil() around the log). One guess was asking "is the hidden number less than x?" for any value of x. The guessing was repeated 10000 times with possibly different (and possibly the same) hidden numbers to compute the average.

Suppose k=ceil(log2n). Then the standard binary search requires k-1 steps for 2k-n hidden numbers, and k steps for the remaining n-(2k-n)=2n-2k hidden numbers. On average, this fits under the log2n+0.1 limit (which by itself doesn't seem to be an obvious mathematical statement, but it feels right and can be verified empirically, and that's probably how the +0.1 part was chosen by the problemsetter); however, the hidden numbers are not chosen randomly, so we might not get the average treatment.

On the other hand, there are many possible binary search trees with such structure (2k-n leaves on level k-1, and 2n-2leaves on level k), That leads to the following solution framework: we need a family of such binary search trees such that every hidden number is equally likely to be on level k across the entire family (and thus equally likely to be on level k-1, too). Our strategy will then be to pick a tree from the family at random, and then do binary search according to it, giving us the required average number of steps irrespective of what the hidden numbers are!

How does one find such a family? Well, after formulating so clearly what we need to find, the actual construction is not that difficult. It's quite clear that the only restriction we have on the tree is that leaves on level k have to come in consecutive pairs; modulo that restriction, we can construct any possible binary search tree. More precisely, let t=n-2k-1 be the number of those consecutive pairs of leaves on level k.

One tree in our family will have numbers (1, 2), (3, 4), ..., (2t-1, 2t) as level-k leaves. Another tree will have (3, 4), (5, 6), ..., (2t+1, 2t+2). We can continue like this, taking the leaf numbers modulo n until we make a full cycle, and it's easy to see that the resulting family has each leaf at level k the same number of times (the example for n=6 is on the left).

This approach requires n to be even, as otherwise at some point we'd need to group the first and last numbers into one consecutive pair, and we can't do that. Here's when the "+1" part in the problem statement comes into play: since we're allowed log2(n+1)+0.1 guesses on average, and not log2n+0.1, we can afford to just solve the problem for n+1 when n is odd.

Thanks for reading, and check back next week!

A doubles week

The March 28 - April 3 week was ignited by Monday's VK Cup 2016 Round 1 on Codeforces (problems, results, top 5 on the left, parallel round results, analysis). Tourist and subscriber left little doubt as to who's the favorite of the entire VK Cup, solving five problems in 1 hour and 6 minutes, while it took other teams at least 1 hour and 41 minutes to do the same!

Just a few hours later, in the early hours of Tuesday morning TopCoder SRM 686 allowed algorithmists from other time zones to enjoy some problem solving (problems, results, top 5 on the left, solutions discussion). jcvb has denied matthew99a his second consecutive win — congratulations to both on great performance!

On Sunday, the penultimate Open Cup 2015-16 round, the Grand Prix of Moscow, challenged teams with some really difficult, if a bit tedious, problems (results, top 5 on the left, analysis). Team SPb SU Base demolished the competition, solving half of their problems in the last hour!

My personal favorite in this round is problem A. Its submission stats: 296 submitted, 0 accepted. And the problem was just about implementing binary search on n choices in log2(n) time :)

More precisely, one needed to interactively guess a hidden number between 1 and n (which was up to 109) spending at most log2(n+1)+0.1 steps on average. One guess was asking "is the hidden number less than x?" for any value of x. The guessing was repeated 10000 times with possibly different (and possibly the same) hidden numbers to compute the average.

Where's the catch? In the lack of rounding. Normal binary search requires ceil(log2(n)) steps in the worst case, and we can't afford that much here. If the hidden number was also random, then we'd get the required number of steps on average — but it's not.

For example, suppose n=6. Normal binary search requires 2 or 3 steps: it starts by splitting the numbers into [1,2,3] and [4,5,6]. Suppose the hidden number is in [1,2,3]. Then it splits the numbers into, say, [1] and [2,3], and in case the hidden number is 1, we've found it in 2 steps, while for 2 and 3 we require 3 steps. We could've split the triple into [1,2] and [3] instead, solving 3 in 2 steps and 1 and 2 in 3 steps, so one might expect that randomizing this choice will make our solution work under our limit of log27+0.1=2.907... on average. But on closer examination, we always require 3 steps for the number 2 (and for 5)! So if we're always asked to find 2 or 5, then we will not fit under the required average number of steps.

In other words, for n=6 we have to split the numbers into 2+4 instead of 3+3 sometimes! And to add insult to injury, we also have to take our previous decisions into account as the binary search narrows down. For example, suppose n=5. Most probably we'd start with 2+3 or 3+2 split for the first request, for symmetry reasons with probability 50% for each decision. Note that all numbers are not equal now: if number 3 is hidden, then we'll always get the bigger segment as the answer for our query! So in order to get a good number steps on average when 3 is hidden, we have to prefer splitting the remaining numbers as [1,2] vs [3], as opposed to [1] vs [2,3]!

This is where our team got stuck at the actual contest :) Can you see how to progress further?

Thanks for reading, and check back soon for last week's summary and this problem's solution!