Thursday, April 23, 2015

A week with different analysis approaches

Before we turn to this week’s events, let’s come back to Qualification Round of Google Code Jam from last week that I’ve somehow forgot to mention (problems, results, top 5 on the left, analysis). The round has lasted for 27 hours and the time of submissions did not matter for qualification, but some contestants still tried to be as quick as possible – congratulations to kyc on being the fastest!


The Code Jam has continued this week with Round 1A (problems, results, top 5 on the left). This time one needed to be fast or solve all three problems in order get into the top 1000 and advance.  Sergey ‘Burunduk1’ Kopeliovich has demonstrated really impressive speed by solving all three problems in just 23 minutes – awesome job!

There were also plenty of other contests this week. On Tuesday, Codeforces Round 299 (problemsresults, top 5 on the left) challenged everybody with some tricky-to-get-right problems – a lot of solutions have failed the system test, including two of myself. Problem C has highlighted one of the beautiful, if a bit standard, ideas of transitioning an algebraic problem into a geometric one.  Several duathletes are competing in a swimming+running duathlon. You know the swimming speed and the running speed of each duathlete, but you don’t know the swimming distance nor the running distance. Which duathletes could win or at least share the first place for some combination of swimming and running distances?


VK Cup 2015 Round 2 was also hosted by Codeforces (problems, results, mirror results, top 5 on the left). The “Never Sorry” team took matters in their own hands this time, being the only team to solve all problems and adding 5 challenges on top of that – amazing! Of course, this is still an early round and the real battle will be at the onsite competition in July.

SRM 656 was TopCoder’s event of the week (problems, results, top 5 on the left). baklazan4247 was the only contestant able to solve the hardest problem correctly, but that was not enough for the first place as he skipped the medium difficulty problem, and xudyh did not, solving it in just 8 minutes and earning the 3000+ "target" rating as the result - congratulations! He seems to have a blog in Chinese telling about his programming contest experiences, but the auto-translated version doesn't seem terribly accurate :)

And finally, the Open Cup Grand Prix of Three Capitals took place on Sunday (results, top 5 on the left). Problem F is a very nice example where using randomized algorithms is much more appropriate than deterministic ones. It went like this: you are given n 4-tuples of points on the plane, and m rectangles with sides parallel to coordinate axes. For each rectangle, you need to check if it contains exactly 2 points from each 4-tuple. In other words, if there’s at least one 4-tuple with 0, 1, 3 or 4 points inside this rectangle, then its answer is “No”, otherwise it’s “Yes”. Can you see how randomness makes this problem easy? On the other hand, can you see a deterministic solution?


Also last week in an offline discussion with Maxim Buzdalov, we’ve brought up the following topic: what’s the best way to do problem analysis at ACM ICPC training camps? The Russian “golden standard” of Petrozavodsk is to use chalk and blackboard to explain the main idea of the problem solution on the stage live, also recording the explanation on camera for future reference. Maxim has advocated for a slightly different approach, used for example at the NEERC: work through the analysis in advance, preparing a presentation with one or two slides per problem listing the key points  then talk through them during the actual analysis. The blackboard analysis tends to be more connected with the audience, since they feel as if they’re creating the solution together with the presenter, and thus are more likely to suggest fixes and improvements; it also requires comparatively little preparation from the presenter.  At the same time, the presentation helps a lot in case the presenter’s or the contestant’s English is not very good so verbal communication is much less effective, and it also leaves a much more convenient reference to use later compared to the video recording; the greater advance preparation makes it easier to avoid incorrect solutions and to present different approaches. Which way do you prefer? Vote at my Google+.

Thanks for reading, and see you next week!

Thursday, April 16, 2015

This week in competitive programming

This week contained two TopCoder rounds. The first one was TopCoder SRM 655 on Thursday (problems, results, top 5 on the left). Gennady 'tourist' showed really outstanding performance getting the highest score on all 3 problems, and the highest gain on challenges (+350) to boot - awesome job!

Here's one of the problems Gennady was the fastest to solve - in fact, he spent just 2 minutes and 44 seconds on reading the problem statement and implementing the correct solution. You're given a 20x20 square board initially painted white, and can repeatedly pick a kxk square on it and paint it either black or white, possible over previous painting. Is it possible to obtain the given picture this way?

TopCoder Open 2015 Round 1A followed on Saturday (problems, results, top 5 on the left). Top 250 active contestants were given a bye for this round, but still the competition was tough and Sevenkplus came out on top - congratulations!

The 24-hour Deadline24 contest has also happened this week in Czeladź (results, top 5 on the left). The amazing team of Psyho, marek.cygan and Mojito1 has won - congratulations! Check out Psyho's writeup on one of the problems - it's a very interesting read.

And finally, here's some auto-awesomed spring Zürichsee. See you next week!

Tuesday, April 7, 2015

This week in competitive programming

ZeptoLab Code Rush 2015 was the only contest this week (problems, results, top 5 on the left). Gennady got a clear first place with another flawless performance - congratulations!

One of the easy problems, problem C, was a good example of the argument that leads to square roots in asymptotic complexity of various algorithms. The problem statement was very simple: you are given two types of candy, one costs X per item and each item brings P units of joy, and the other costs Y per item and each item brings Q units of joy. What's the maximum amount of joy you can get after spending at most Z? Of course, you can't buy partial items, as otherwise we can simply buy the candy with the best joy/cost ratio. Can you solve this problem in O(sqrt(Z))? How about O(poly(log(Z)))?

One of the problems I've described last week was about counting strings of given length with the given value of a polynomial hash. The way to solve it was to notice that after we know the hash of the first n/2 characters, let's say it's equal to X, and the hash of the last n/2 characters, let's say it's equal to Y, then the hash of the entire string is simply X*pn/2+Y. In other words, in order to find the number of ways to reach the given hash value, we need to sum the products of numbers of ways to reach two hash values for the halves that sum to the given value - now we can notice that this is equivalent to multiplying polynomials. Polynomials can be multiplied in O(NlogN) using Fast Fourier Transform. Moreover, very conveniently (but not suprisingly), one needed to output the answer modulo a prime 998244353=223*119+1, which means that there exists a 223-th root of unity modulo that prime, and thus the FFT algorithm can work just fine.

Here's an interesting addition: this comment (in Russian) describes a way to multiply two polynomials modulo some number that does not have a nice root of unity, so this problem would actually be solvable just fine modulo 109+7 or any other modulo of that order of magnitude.

Thanks for reading, and check back next week!

Sunday, March 29, 2015

This week in competitive programming

TopCoder SRM 654 took place in the early hours of Thursday (problems, results, top 5 on the left). Less than half a year after winning an SRM for the first time, Endagorion has now earned his third SRM victory, a feat accomplished by just 50 contestants in the history of TopCoder - congratulations! Next step would be to become the 32nd person to win four SRMs, of course :)

Russian Code Cup 2015 Qualification Round 1 happened on Saturday (problems in Russian, results, top 5 on the left). This was the strongest qualification round since top 200 have qualified and will thus be unable to participate in the next two qualification rounds. Congratulations to Gennady on another flawless victory!

The last problem is worth mentioning at least for its very simple statement: how many strings of length n are there with the given value of a polynomial hash? n is up to 106, the polynomial hash is computed with the given base p and modulo m, m is up to 104 (more precisely, the hash is equal to a0+a1*p+a2*p2+... mod m, where ai is the i-th character of the string, which consists only of lowercase English letters). You need to output the answer modulo 998244353.

Finally, Open Cup 2014-15 Grand Prix of America happened on Sunday as usual (results, top 5 on the left, results of another contest with the same problemset but different time limits). One of the tricky problems, problem G, was about constructing a long string s using a short string t. We start with just the string t. Now we insert another occurrence of t anywhere (possibly before the first or after the last character), and repeat this process until we obtain string s. In the sample input, s was "hhehellolloelhellolo" and t was "hello". Given the string s with at most 200 characters, what's the shortest string t that could've been used to construct it?

Thanks for reading, and see you next week! Please also find a photo with a spring - if a bit gloomy - feeling on the left :)

Sunday, March 22, 2015

This week in competitive programming

TopCoder SRM 653 has ignited this week's contests on Tuesday (problems, results, top 5 on the left). Egor and Kazuhiro were in their own league with amazingly fast solutions both for the medium and for the hard problem, but Egor has squeezed out the victory during the challenge phase - congratulations!

The most interesting part of this round, in my view, was coming up with a challenge for the easy problem. You were given a sequence of numbers with some numbers replaced by wildcards, and were guaranteed that the sequence before replacement by wildcards consisted of several consecutive segments of equal numbers, where each number is equal to the length of the corresponding segment, for example: 3, 3, 3, 4, 4, 4, 4, 2, 2, 2, 2 (3x3+4x4+2x2+2x2), then 3, *, 3, *, *, 4, 4, *, *, 2, * with wildcards. The problem asked to check if there's more than one way to reconstruct the numbers that were replaced by wildcards, and many people have simply counted the number of ways to reconstruct the numbers, and compared it with 1.

Many of those people have used the 32-bit integer type to count the number of ways, but it's not sufficient to count the number of ways for a sequence of length 100, so their solutions might fail if the total number of ways is k*232+1. I found one such solution during the challenge phase, and tried to create a testcase to fail it - but could not, and neither did the system test fail it. However, right after the SRM Misha 'Endagorion' has posted such testcase on Codeforces. Can you come up with a tricky testcase without following that link?

Codeforces Round 296 (problems, results, top 5 on the left) happened later that day. I've skipped the round, but want to congratulate piob on the amazing victory which he achieved in just 54 minutes out of two hours!

On Saturday, VK Cup 2015 Round 1 pioneered a (relatively) original competition format: 2 person teams (problems, results, top 5 on the left). Congratulations to Boris and Adam on the victory! The pre-round favorite team "Never Sorry" has led through most of the contest, but had to resubmit the solution for the hardest problem several minutes before the end of the round and dropped to fourth place. The reason for their resubmission? Their solution made an out-of-bounds access, namely tried to reach n+1-st character in an n-character string. Since they were using C++, this would have flown just fine if that was an ordinary string, but they had their own class since the string was constructed implicitly, and it had an explicit assertion for out-of-bounds accesses. Indeed, removing line 87 "assert(false);" from their first submission makes it pass the system test!

I have to admit that this example goes against my philosophy that more strict languages like Java or Pascal lead to higher probability of passing the system test because more bugs can be caught during the coding phase. Of course, this is just one example :)

VK Cup 2015 Round 1 online mirror was held several hours later with a slightly modified problemset (problems, results, top 5 on the left). Congratulations to Ivan on his first victory on Codeforces!

Now, let's come back to the problem I described last week and the new data structure. You are given a tree with at most 105 vertices, where each edge has an integer length, and a sequence of 105 updates and queries. Each update tells to color all vertices in the tree that are at most the given distance from the given vertex with the given color. Each query requires you to output the current color of a given vertex.

The data structure as described in the Russian post-match discussion and in another Codeforces comment is called "Centroid Decomposition of a Tree". We start by finding the centroid of the tree: a vertex such that it splits the tree into components of size at most N/2, where N is the number of vertices in the tree. One way to find such vertex is to pick an arbitrary root, then run a depth-first search computing the size of each subtree, and then move starting from root to the largest subtree until we reach a vertex where no subtree has size greater than N/2.

Let's mark the centroid with label 0, and remove it. After removing the centroid the tree separates into several parts of size at most N/2. Naturally, now we do the same recursively for each part, only marking the new centroids with label 1, then we get even more parts of size at most N/4, mark their centroids with label 2, and so on, until we reach parts of size 1. Since the size decreases at least twice with each step, the labels will be at most log(N).

The process of construction is displayed in the pictures on the left. The right subtree displays an interval tree analogy, while the left subtree shows that more unusual things can happen.

Now, consider any two vertices A and B in the tree and the path connecting them, and let's find the vertex C with the smallest label on that path. It's not hard to see that the path connecting A and B lies entirely in the part that vertex C was the centroid of in the above process, and that A and B lie in different parts that appear after removing C. So our path is concatenation of two paths: from C to A, and from C to B.

Finding C given A and B is also easy: let's just keep a link from each vertex to its "parent" in the above process (if our vertex has label K, the parent will have label K-1), and let's repeatedly follow this link either on A or on B, whichever currently has a higher label, until the two coincide.

Notice that we've chosen O(NlogN) paths in the tree (from each centroid to all vertices in the corresponding part) such that every path is a concatenation of two paths from that set, and we can find those two paths in O(logN) time. Such decomposition of paths turns out useful in many problems.

Now, how does one solve the problem in question? Well, whenever we need to color all vertices B at distance at most D from the given vertex A with color X, we will group possible B's by C - the vertex with the smallest label on the path from A to B, as descried above. To find all possible C's, we just need to follow the "decomposition parent" links from A, and there are at most O(logN) such C's. For each candidate C, we will remember that all vertices in its part with distance at most D-dist(A,C) from C need to be colored with color X.

When we need to handle the second type of query, in other words when we know vertex B but not A, we can also iterate over possible candidate C's. For each C, we need to find the latest update recorded there where the distance is at least dist(B, C). After finding the latest update for each C, we just find the latest update affecting B by comparing them all, and thus learn the current color of B.

Finally, in order to find the last update for each C efficiently, we will keep the updates for each C in a stack where the distance decreases and the time increases (so the last item in the stack is always the last update, the previous item is the last update before that one that had larger distance, and so on). Finding the latest update with at least the given distance is now a matter of simple binary search.

As usual, I'm expecting that some of you have already known this data structure for ages. Still, I'd love to hear what do you think about my explanation above! Also please tell if you've read a better explanation somewhere else.

And in any case, check back next week!

Tuesday, March 17, 2015

This week in competitive programming

TopCoder SRM 652 took place on Monday (problems, results, top 5 on the left). The round was soon after the flight home from the Hacker Cup, so I've skipped it and thus don't have much to tell about the problems. Adam "subscriber", who has already been featured in top 5 on this blog several times, has won his first SRM - great job!

Here are the solution ideas for the problems from the last week's summary. The first problem described there was about picking the right order to apply skill improvements. The solution is explained very well in the editorial (heading "521D - Shop"), but the basic steps one needed to make were:
  1. All multiplications should be applied in the end, in arbitrary order, and higher multiplier is better than lower multiplier.
  2. All assignments should be applied before all other operations, and we should use at most one assignment per skill. Since we only apply one, we can imagine that we have an addition instead of an assignment: highest value that can be assigned minus initial value.
  3. For each particular skill, higher addition is always better than lower addition, so we should sort the additions in decreasing order and imagine applying them in that order. In this case, for each addition we know for sure the value of the corresponding skill before and after the addition, and thus we can imagine that we have a multiplication instead (new value divided by old value)!
  4. Now all our operations are multiplications, and we should simply sort them in decreasing order.
The second problem was about Conway's look-and-say sequence, and the main solution idea is actually described in the linked Wikipedia article as the "cosmological theorem": sooner or later, the sequence separates into several parts that never interact again, and it turns out that the set of all possible strings that do not separate into several parts is very small - there are around 100 such strings - so we get a linear recurrence relation on a 100-element vector and can use fast matrix exponentiation to apply it many times quickly.

And the third problem was about reconstructing the smallest parallelepiped drawn on the grid that contains the two given cells. Here's the solution that avoids a bulk of case studies that I was referring to: when the two given points are close to each other, we can just try all small parallelpipeds until we find one that covers both; when they are far from each other (let's say at least 10 apart), it's not hard to see that there's a simple lower boundary on the answer that is achievable. More specifically, let the parallelepiped horizontal side be a, vertical side be b, and the diagonal side be c. y-coordinate can be increased at most b+c-2 times  (and similarly x-coordinate can be increased at most a+c-2 times) inside the parallelepiped, so b+c-2 must be at least y2-y1, and thus a+b+c (which is almot the answer) must be at least 5+y2-y1. Also, the difference between the coordinates doesn't change when we move diagonally, so it can only increase at most a+b-2 times, so the answer must be at least 5+(y2-x2)-(y1-x1). The only remaining step is to understand that the maximum of those lower bounds is always achievable if the two given points are sufficiently far apart.

Now, let's finish digging into the Open Cup archives! Open Cup 2014-15 Grand Prix of China happened 2 weeks ago (results, top 5 on the left). Let me describe the problem that I couldn't solve. You're given a simple undirected connected graph with at most 8 vertices. Let's say each edge has a random uniform real weight between 0 and 1. What's the expected value of the minimum spanning tree of this graph? It would seem that with 8 vertices any solution should work, but I've managed to implement one that times out :) Of course, the key to creating a fast solution lies in linearity of expectation. Can you see how to use it?

And finally, Open Cup 2014-15 Grand Prix of Tatarstan happened this Sunday (results, top 5 on the left). I've learned a new beautiful data structure at this contest - a rare event in my old age. Here's the problem that required it: you are given a tree with at most 105 vertices, where each edge has an integer length, and a sequence of 105 updates and queries. Each update tells to color all vertices in the tree that are at most the given distance from the given vertex with the given color. Each query requires you to output the current color of a given vertex.

I couldn't invent the data structure during the contest time, nor did I know it in advance, so I've only managed to come up with a O(N*sqrtN) solution for the case where all update distances are equal. But it turns out that the problem is solvable in O(N*logN*logN) for arbitrary updates. Do you know which data structure helps here?

Thanks for reading, and see you next week!

Saturday, March 7, 2015

This week in competitive programming

Codeforces Round 295 happened early on Monday (problems, results, editorial with challenges, top 5 on the left). Let me describe problem D which I didn't solve during the round: you have an array with 105 positive integers, denoting your various skill levels in an online game. You're also given up to 105 possible improvements for your skills. Each improvement is applicable to one particular skill, and is of one of three types: set the skill level to the given value, add the given value to the skill level, or multipy the skill level by the given value. The skill that can be improved, the type of improvement, and the improvement value are fixed for each possible improvement, so the only freedom you have is which improvements you will apply, and in which order. You goal is to achieve the maximum product of all your skill levels using at most m improvements (m is also given).

This problem requires careful reduction of complexity until it becomes simple. The first step, for example, is to notice that it only makes sense to apply the "multiplication" improvements after all others, and the order of their application does not matter. I've managed to do a few more steps during the contest, but stopped short of the solution because I couldn't find a way to properly handle the "assignment" improvements. Can you see the remaining steps?

Facebook Hacker Cup 2015 Final Round happened on Friday in Menlo Park (results, top 5 on the left). I've managed a very good start by getting the first submission and then skipping the tricky geometry problem (the "Fox Lochs" column in the above table). After submitting the 20-point with almost two hours left, I had two strategies to choose from. I could go back to the geometry problem, implement it carefully, test it a lot, but thus likely not solve anything else - looking at the final scoreboard, that would've earned me the second or third place in case my solution was correct - but given that even Gennady had failed this problem, that's far from certain. Or I could try to solve one of the two harder problems which seemed to require some thinking but looked tractable.

I've decided to go after the 25-point problems, and after some thinking came up with a O(N*sqrtN) solution for the fourth problem ("Fox Hawks"). The problem was: you're given a boolean expression with at most 200000 boolean variables, each appearing exactly once, for example "((1 & (2 | 3)) | 4)". What's the k-th lexicographically set of variable values that evaluate this expression to true?

It was not clear whether O(N*sqrt(N)) would pass within the time limit, so I hoped for the best and started implementing it. The implementation was a bit tricky and required more than an hour (including writing a simple but slow solution and comparing on a lot of small testcases). I've finally downloaded the input with about 30 minutes left in the contest - and my solution turned out a bit too slow (solved 12 testcases out of 20 in the time limit) :( Had I implemented it a bit more efficiently, or had I used a more powerful computer, I might've got it, and it turns out that would've earned me the victory. Well, better luck next time!

After discussing my solution with Slava "winger" Isenbaev after the contest, I've also realized that it's not hard to change it into a O(N*logN) approach. I've done that after the contest, and after about 20 minutes of coding and 5 minutes of debugging I got a working solution that solved all cases in about 20 seconds (out of 6 minutes). Can you guess what's the O(N*logN) approach knowing that it's an improvement of an O(N*sqrtN) one?

Now, let's continue covering the Open Cup contests from February. Before presenting a new round, let me give the solution ideas for the problems I posted last week.

The first problem was about finding the k-th lexicographically borderless word of length n (up to 64), where a word is borderless when it has no non-trivial borders. In order to solve this problem, let's learn to count borderless words first. The first idea is to notice that if a word of length n has any borders, then it must also have a border of length at most n/2 (because if a prefix is equal to a suffix and they intersect, then we can find a shorter prefix that is equal to a suffix - see the picture on the left). Because of this, when n is odd, the number of borderless words of length n is equal to the size of the alphabet times the number of borderless words of length n-1 - we can simpy put an arbitrary character in the middle of a word of length n-1. And when n is even, the number of borderless words of length n is equal to size of the alphabet times the number of borderless words of length n-1 minus the number of borderless words of length n/2: we can also put an arbitrary character into the middle of a word of length n-1, but we need to subtract cases where the new word has a new border of size n/2. Finding k-th lexicographically borderless word is done in a very similar manner: instead of just counting all borderless words, we can use the same approach to count all borderless words with the given prefix.

The second problem was about finding two deterministic finite automata with at most n+1 states that, when used together, accept the given word and only that word. The automaton with n+2 states that accepts only the given word is straightforward. How do we get rid of one state? Well, let's glue two adjacent states together. This will result in an automaton that accepts the given word, but also all words where the letter in a certain position can be repeated an arbitrary number of times. If we do this once for the first letter, and the second time for the last letter, we'll obtain the two automata that we need, unless all letters in our word are equal. An in case all letters in our word are equal, it's not hard to see that there's no solution.

Now, on to new tricky problems! Open Cup 2014-15 Grand Prix of Karelia happened 3 weeks ago (results, top 5 on the left). The most difficult problem F was about the Conway's look-and-say sequence starting with 2: we start with single digit 2, then repeatedly describe what we see. For example, on the first step, we see one "2", so we write "12". On the second step, we see one "1" and one "2", so we write "1112". On the third step, we see three "1"s and one "2", so we write "3112", and so on. How many digits does the n-th step contain (modulo p)? n is up to 1018.

Open Cup 2014-15 Grand Prix of Udmurtia happened 2 weeks ago (results, top 5 on the left). Let's talk about a relatively easy problem for a change.

Problem B of this round was concerned with drawing parallelepipeds on a grid. More specifically, a parallelepiped is drawn on a grid like this: we start with a rectangle, then add three diagonal segments of the same length, and then connect their ends as well - see the picture on the left. The parallelepiped has three parameters: the two sides of the rectangle, and the size of the diagonal. All three parameters must be at least 3.

A parallelepiped was drawn, but then all squares but two were erased, so you're given the coordinates of the two remaining squares, each up to a billion. What's the smallest possible total number of cells in the original drawing?

This problem looks quite nasty from the outside, and it feels like it can have a lot of tricky corner cases. But it turns out it's possible to write a solution that sidesteps those and solves everything in quite general manner. How would you approach this problem?

Thanks for reading, and check back next week!