Sunday, March 1, 2015

This week in competitive programming

There were three big competitions planned for this week: Challenge24 2015 Electronic Contest, TopCoder SRM 651 and Open Cup 2014-15 Grand Prix of China. However, the first two faced some serious issues, and the Open Cup is yet to happen tomorrow. Given that I've falled a bit behind on the Open Cup rounds, I will keep this week's summary without any news, and continue describing the Open Cup rounds from February.

Last week we discussed a very nice problem from the Grand Prix of Japan: count the number of pairs of strings (s, t) such that the length of s is N, the length of t is M, each character is from an alphabet of size A, and t is a substring of s.

Suppose the string t is fixed. How do we count the number of strings s of given length that contain t? We will count two things: the number an of strings s of each length n between 1 and N that do not contain t, and the number bn of strings s of each length n that contain t, but only as a suffix and not before. First approximation for an is simple: we have to take a string of length n-1 and add one character to it, so we have an=an-1*A. Of course, that formula is not correct because it doesn't take t into account at all. How could t appear in the length-n string, if the first n-1 characters didn't contain t? Well, it must appear at the rightmost position, so the correct formula is an=an-1*A-bn.

But how do we find bn? Well, the first approximation is also simple: we have to take a string without any appearances of t, and add t in the end: bn=an-M. However, this is also not correct because by adding t in the end, we might've also added some occurrences of t before the end, so we need to subtract something. Well, let's look at the first appearance of t that we added - let's say it ends p (p>0) characters from the end. This can happen only if t has a border (a string that is both its prefix and its suffix) of length M-p - see the picture on the left. And for each string of length n-p that ends with t, there's exactly one way to extend it to a "bad" string of length n. So the correct formula is: bn=an-M-sum(bn-p), where the sum is taken over all positive p such that there's a border of length M-p in t.

In other words, it turns out that knowing all border lengths of t is sufficient for finding the number of strings of any given length that contain it. At the same time, not all sets of border lengths are possible, since having even one border implies a lot of equalities between the string characters, and in case we have several, the string quickly converges to having all characters equal, and thus all border lengths at the same time. This idea inspires a reasonably simple backtracking: let's try to build all non-contradictory sets of border lengths for a string of length M by recursively trying to include/not include the longest border (of length M-1), then M-2, and so on. We will bail out from the recursion as soon as we have a contradiction. In order to check if we have one, we will build equivalence classes of characters according to borders that do exist, and check if they imply a border that we've declared as not existing.

This recursion runs almost instantaneously for M=50, and reports that there are only 2249 non-contradictory sets of border lengths. We have almost solved our problem: we can iterate over all possible sets of border lengths, and find the number of strings s that contain a string t with such set of border lengths using the dynamic programming described above. But note that we must also multiply that by the number of such strings t - so we also need to find the number of strings with the given set of border lengths.

The latter, on the surface, is also computed easily: it's simply A to the power of the number of equivalence classes defined by the borders. But as you've probably guessed, this will overcount: here we won't throw out the strings that have some additional borders. But since we have just 2249 different sets of borders, we can fix this by simply subtracting the numbers of strings for all supersets of our set of border lengths, obtaining a dynamic programming approach for counting t's. And that completes the solution!

The next round of the Open Cup, Northern Grand Prix, happened three weeks ago (results, top 5 on the left). Just like the Grand Prix of Japan, this round featured many interesting problems, but I'd like to highlight two. Problem B was also about string borders: one needed to count the k-th lexicographically borderless word of length n (up to 64), where a word is borderless when it has no non-trivial borders. Riding the wave of the solution explained above, I've tried to use the same approach by using inclusion-exclusion principle over sets of possible border lengths - but unfortunately, this approach was a bit too slow for the time limit. Can you see how to solve this problem faster?

Problem D considered languages defined by deterministic finite automata. More precisely, one needed to define a language containing just the given word, and no other words. When the given word has length n, it's easy to construct an automaton with n+2 states that accepts the given word and only the given word: a chain of n+1 states that have transitions using the characters of the word, the (n+1)-st state being final and all other transitions leading to (n+2)-nd "sink state" which is not final. This problem, however, limited the automata to n+1 states. It's not hard to see that it's impossible to create an automaton with n+1 states that accepts the given word of length n and only that word. Because of that, you were asked two output two automata with at most n+1 states that have the language with just one word in their intersection - in other words, such that both of them accept the given word of length n, and there's no other word such that both of them accept it. Can you figure this one out?

Thanks for reading, and check back next week!

Sunday, February 22, 2015

This week in competitive programming

Codeforces Round 292 took place late on Tuesday (problems, results, top 5 on the left). Haghani has pulled off an impressive victory given that he was not the rating favorite - but of course he did climb much higher in the ratings as a result, now he's ranked 28th. Congratulations! rng_58, on the other hand, could've probably been the rating favorite if not for his recent strategy of solving the hardest problem first. Once again he became the only person to solve the said hardest problem, but couldn't solve two other problems in time and only placed 10th.

TopCoder SRM 650 started next morning less than 8 hours after the Codeforces round ended (problems, results, top 5 on the left), so big congratulations to Endagorion on winning it in addition to placing 3rd earlier, and for being the only one to solve the hard problem!

There were also four Open Cup rounds in January and February. The problems from those rounds are not published yet, but it's been a month now and it's boring to wait longer, so I will try to explain the problem statements in full below.

Open Cup 2014-15 Grand Prix of Japan took place three weeks ago (results, top 5 on the left). Problem I was, on the surface, a repeat of a problem from Petr Mitrichev Contest 9 back from 2011: it simply asked to count the number of pairs of strings (s, t) such that the length of s is N, the length of t is M, each character is from an alphabet of size A, and t is a substring of s. When this problem was given on my contest, N, M and A were up to 100, and you needed to output a floating-point number - the probability that a random pair has this property. No team could solve this problem during the contest - take a look at problem H in standings. The reference solution made heavy use of the fact that we need to output a fraction and are allowed 10-9 error, since for longer substrings the probability is almost zero, and for shorter substrings we can iterate over all possible prefix functions of the substring, and when the prefix function is fixed counting superstrings is a relatively straightforward dynamic programming on the Knuth-Morris-Pratt automaton.

Fast forward 3.5 years, and now the harder version is given on a contest: now N is up to 200, M is up to 50, A is up to 1000, and you need to find the answer modulo 109+7, so you can't squeeze by with approximation arguments anymore. And one team does solve this problem - congratulations to Alex, Slava and Artem!

The key to solving the harder version is to notice that we don't, in fact, need the entire prefix function of the substring to count the number of superstrings containing it: it turns out that we can just look at the set of prefixes of the substring that are also its suffixes, in other words at the set of its borders. Can you see why the borders are enough?

Thanks for reading, and check back next week for the full explanation of this problem's solution, and for a new hard problem!

Thursday, February 19, 2015

This week in competitive programming

Theorem (Generalized Cayley's formula?).

Given a graph with n labeled vertices, let s be the number of its connected components and m1,m2,...,ms be the numbers of vertices in those components. Then the number of ways to add s-1 edges to this graph so that it becomes connected is ns-2∏mi.

When Mikhail `Endagorion' Tikhomirov told me this theorem, I was very suprised that it looks so simple yet I've never encountered it before. Is it a well-known result? Does it have a name?

This theorem comes in handy when solving a problem from previous week's summary: how many simple undirected graphs are there with n labeleld vertices and k bridges?

Here's how we can solve this problem: in addition to the numbera an,k of graphs with n vertices and k bridges, we'll also count the number bn,k of connected graphs with n vertices and k bridges (including, as k=0 case, the number of biconnected graphs), the value cn,k which is the sum over all graphs with n vertices and k connected components, such that all its connected components are biconnected, of the products of sizes of the components, and finally the number dn of connected graphs with n vertices.

Let's start with bn,k with positive k. Since the graph has at least one bridge, the bridges split it into several biconnected components, and the Generalized Cayley's formula mentioned above helps count the ways to join those biconnected components with bridges into a connected graph, so we get bn,k=nk-1*cn,k+1. Now bn,0 is simply dn-sum(bn,k).

To find cn,k, we first notice that cn,1=n*bn,0. For larger k, we should consider how many vertices are there in the biconnected component the contains the first vertex - let's say it has m vertices, then we have to multiply the number of ways to pick the remaining m-1 vertex in the component by the numbe of ways to pick the remaining components. In other words, cn,k=sum(comb(n-1,m-1)*cm,1*cn-m,k-1).

Finding dn is a standard question that is solved in a similar way: connected graphs are all graphs minus disconnected graphs, and disconnected graphs can be categorized by the size of the connected component containing the first vertex. In other words, dn=2n*(n-1)/2-sum(comb(n-1,m-1)*dm*2(n-m)*(n-m-1)/2).

And finally, we can find the value that we need: an,k. Again, let's look at the connected component containing the first vertex, and suppose it has m vertices and p bridges. Then we get an,k=sum(comb(n-1,m-1)*bm,p*an-m,k-p).

We compute O(n2) values, spending O(n) for each value in b, c and d, but O(n2) for each value of a, so the total running time is O(n4).

Here's the code:
public class Fragile {
    static final long MODULO = (long) (1e9 + 7);

    public int countGraphs(int N, int K) {
        long[][] a = new long[N + 1][N + 1];
        long[][] b = new long[N + 1][N + 1];
        long[][] c = new long[N + 1][N + 1];
        long[] d = new long[N + 1];
        d[0] = 1;
        long[][] comb = new long[N + 1][N + 1];
        comb[0][0] = 1;
        a[0][0] = 1;
        long[] p2 = new long[N * (N - 1) / 2 + 1];
        p2[0] = 1;
        for (int i = 1; i < p2.length; ++i) {
            p2[i] = 2 * p2[i - 1] % MODULO;
        }

        for (int n = 1; n <= N; ++n) {
            comb[n][0] = 1;
            for (int k = 1; k <= N; ++k) {
                comb[n][k] = (comb[n - 1][k - 1] + comb[n - 1][k]) % MODULO;
            }
            d[n] = p2[n * (n - 1) / 2];
            for (int m = 1; m < n; ++m) {
                d[n] = (d[n] - comb[n - 1][m - 1] * d[m] % MODULO * p2[(n - m) * (n - m - 1) / 2] % MODULO + MODULO) % MODULO;
            }
            for (int k = 2; k <= n; ++k) {
                for (int m = 1; m < n; ++m) {
                    c[n][k] = (c[n][k] + comb[n - 1][m - 1] * c[m][1] % MODULO * c[n - m][k - 1]) % MODULO;
                }
            }
            b[n][0] = d[n];
            long pw = 1;
            for (int k = 1; k < n; ++k) {
                b[n][k] = pw * c[n][k + 1] % MODULO;
                pw = pw * n % MODULO;
                b[n][0] = (b[n][0] - b[n][k] + MODULO) % MODULO;
            }
            c[n][1] = n * b[n][0] % MODULO;
            for (int k = 0; k < n; ++k) {
                for (int m = 1; m <= n; ++m) {
                    for (int p = 0; p <= k; ++p) {
                        a[n][k] = (a[n][k] + comb[n - 1][m - 1] * b[m][p] % MODULO * a[n - m][k - p]) % MODULO;
                    }
                }
            }
        }

        return (int) a[N][K];
    }
}
Do you know a simpler way to solve this problem? Do you know a faster way to solve this problem?

My solution at the contest was even more complicated with O(n4) states and O(n6) running time, so it didn't complete in 2 seconds and I had to precompute the answers and submit those. I've used the following dynamic programming: how many graphs are there with n vertices, m vertices in the biconnected component containing the first vertex, k bridges, and p bridges going out from the biconnected component containing the first vertex.

Now, let's turn to last week's only regular online competition - TopCoder SRM 649 (problems, results, top 5 on the left). Fourteen people solved all three presented problems, but sdya has managed to create a comfortable 100+ point margin with fast solutions and a challenge - congratulations!

Thanks for reading, and check back next week!

Friday, February 13, 2015

This week in competitive programming

TopCoder SRM 648 has started the last week's contests as early as Monday (problems, results, top 5 on the left, my screencast). The hard problem, while innocent on the surface, turned out to be exceptionally tricky, requiring some serious combinatorics and dynamic programming mastery - check it out! The statement is as simple as: how many simple undirected graphs are there with N labeled vertices and K bridges? N is up to 50.

Codeforces Round 290 took place just several hours later (problems, results, top 5 on the left, my screencast). Problem D was another cute combinatorics exercise. Its most interesting part boiled down to: given a tree, how many ways are there to remove its leaves one by one until we remove everything? Of course, new leaves may appear after removing some. Had the tree been rooted, this would be a pretty standard dynamic programming application, but how to deal with the fact that there's no root?

Finally, Rockethon 2015 gave everybody a shot at prizes on Saturday (problems, results, top 5 on the left). Gennady has carefully progressed through all problems, not really leaving anybody else a chance - congratulations!

There were also several Open Cup rounds in the past weeks, and the Petrozavodsk training camp has assembled an unprecedented number of ACM ICPC teams - but as the same problems are still going to be used for other training camps, I will postpone discussing those for now. Stay tuned for some really tough problems in the coming weeks!

Wednesday, February 4, 2015

This week in competitive programming

Facebook Hacker Cup 2015 has completed its weekly routine with Round 3 on Saturday night (problems with Facebook login, results with Facebook login, top 5 on the left, my screencast). Top 25 will travel to the United States for the final round on March 6, and big congratulations to Gleb who claimed the first place in this really competitive round!

Let's also come back to the interesting problem I mentioned in last week's summary: you are given a directed graph where each arc has a cost, one of the vertices is marked as starting point, and one as ending point. You need to pick a subset of arcs with the smallest total cost such that every path from the starting point to the ending point (including paths that pass through the same vertex or arc more than once) passes through arcs from this subset exactly once.

Whenever we need to pick a subset of arcs that splits a graph in two, the theory of flows and cuts comes to mind. However, we're not simply asked to find a minimum cut on a given graph, and not even on its condensation, as illustrated by the following acyclic example:

In this graph, the minimum cut is of size 2 and separates vertices S and A from vertices B and T. However, the path S-B-A-T crosses the cut three times (twice going outside, and once going inside), and we need for each path to cross the cut exactly once. Basically, the minimum cut is solving the "at least once" version.

But the power of the flow/cut theory often lies in applying it to slightly modified graphs. For cuts in particular, adding infinite arcs to the graph is usually the solution of choice. Here our goal is to make sure one can't re-enter the cut after leaving it, so for each arc we'll add a reverse arc with infinite capacity, as illustrated by the following picture:

In particular, the newly added A-B arc of infinite capacity changes the incorrect cut displayed above from capacity 2 to infinite capacity (1+1+infinity), and thus the smallest cut is now of size 3, displayed by the dotted line to the left. There's also another cut of size 3 - just cut off the source vertex S. It's not hard to verify that every path from S to T croses this cut exactly once.

Another great thing about this addition of infinite arcs is that it allows us to skip the condensation step. If there's a cycle somewhere in our graph, we'll thus add a reverse infinite cycle, and thus no minimum cut will ever cross any cycle!

Such observations make this solution "click" when you see it: instead of fighting with the problem, everything flows naturally and the solution makes sense from all angles. I especially value such moments, I think they show the beauty of mathematics and algorithms.

However, everything was not that simple in this problem :) It turns out that in order for this solution to work correctly in all cases, we need to prune the graph before adding the infinite arcs. More specifically, we need to remove all vertices that don't lie on any path from S to T from the graph. Can you see where we went wrong, and why doesn't the cut theory save us in this case?

Thanks for reading, and check back next week!

Sunday, January 25, 2015

This week in competitive programming

Saturday night was the competition time this week, starting with TopCoder SRM 647 (problems, results, top 5 on the left, my screencast). The hard problem had a very simple and interesting statement, might've looked as a relatively straightforward maximum flow application, but it came with an unexpected twist that foiled many experienced competitors. You were given a directed graph where each arc has a cost, one of the vertices is marked as starting point, and one as ending point. You needed to pick a subset of arcs with the smallest total cost such that every path from the starting point to the ending point (including paths that pass through the same vertex or arc more than once) passes through arcs from this subset exactly once.

The idea that we can use minimum cut theory to find this subset looks natural. But just applying minimum cut algorithms on the given graph doesn't fit the bill: it is indeed guaranteed that we will cross every cut at least once when walking from the starting point to the ending point, but we might also cross a cut twice or more, and this is not allowed by the problem statement. The usual recipe in situations like this is adding auxiliary infinite arcs to the graph which will essentially limit the set of cuts we look at, since the minimum cut will obviously contain no infinite arcs. Can you see how to add infinite arcs to this graph to solve the problem?

Just a few hours later Facebook Hacker Cup 2015 Round 2 narrowed the field of competitors to just one hundred algorithmists (problems with Facebook login, results with Facebook login, top 5 on the left, my screencast). Once again, the really punishing scoring model meant making sure that the solutions are correct was more important to qualifying than solving problems fast. Nevertheless congratulations to Ivan on claiming the first place! The competition at the ACM ICPC World Finals is going to be very fierce this year.

 
Somewhat orthogonally, here are a couple of pictures from my recent touristic visit to London. Quite different skies and quite different surroundings, London is full of everything :) Thanks for reading, and check back next week!

Thursday, January 22, 2015

This week in competitive programming

Contrary to last week's serenity, this week has been really busy in terms of contests. Early on Monday, Codeforces Round 285 started the sequence (problems, results, top 5 on the left). Competitors in places from second to fifth remained within one successful challenge from each other, but Boris was not within reach and achieved a commanding victory - congratulations!

TopCoder SRM 645 happened later on Monday (problems, results, top 5 on the left, my screencast). The medium problem required careful logical thinking to apply several relatively easy steps in sequence to obtain a working solution. You were given two sets of points on the plane, of the same size, which was at most 50, and also exactly three magic points on the same plane. In one step, you were allowed to pick one of the three magic points, and rotate all points in the first given set by 180 degrees with the selected magic point as the center of rotation. Note that all points are rotated at the same time and with the same center of rotation. Is it possible to obtain the second set of points via zero or more such steps from the first set of points?

TopCoder SRM 646 continued the fun early on Friday (problems, results, top 5 on the left). Tiancheng continued to show impressive form (previous post) and climbed into the 3rd spot (shared with Tomek) in all-time SRM wins stats - awesome performance, both this week and in the past years!

Facebook Hacker Cup Round 1 occupied 24 hours on the weekend (problems with Facebook loginresults with Facebook login, top 5 on the left, my screencast). Of course, one didn't need to spend that much since the problems were relatively simple. The challenge, as usual with the Hacker Cup, was in the format that punishes every mistake very heavily: since the round lasted for 24 hours, many competitors who would not solve these problems in tighter time constraints got a chance to steadily work towards solving all four, and that, in turn, meant that most likely all 500 qualifying spots would be taken by people who solved everything, and thus almost any mistake could mean a good bye to the Hacker Cup. The cutoff ended up being 75 points, not 100 points, but still mistakes in solving the hardest "Corporate Gifting" problem were punished with elimination. Because of this, this round brought slightly different qualities into the limelight: testing one's solutions very carefully, the ability to read the statement without any omissions, and having a good proof for each and every mathematical solution. To put it another way, it was all about patience :)

Finally, Codeforces Round 286 ended the week late on Sunday (problems, results, top 5 on the left, my screencast). After reading all problems, I was unable to come up with an algorithm to any of the five problems, which is quite unusual for Codeforces since problems A and B are usually not that difficult. Given that all problems required thinking, I've decided that I might as well go after the one that was scored the highest, and that turned out to be too optimistic. The problem simply asked to count the number of palindromes of the given length (up to 109) containing the given string (of length at most 200) as a subsequence (not necessarily as a substring - i.e. the characters don't have to be consecutive), modulo 10007.

You can see my slow progress towards a solution in the screencast: at about 37 minutes mark of the screencast, I've implemented a slow solution and automatic period detection, hoping that the answers for consecutive lengths would form a sequence with a short period - but they would not. Then, at about 50 minutes into the screencast, I came up with the idea that the slow solution can actually be improved by considering segments of palindrome before the next match of the subsequence at once, and thus separating the result into a large sum of products of powers of 24, 25 and 26 (more details probably in a later post), but after implementing this part around 1 hour 1 minute mark I've realized that I still have no idea how to quickly compute the sum I reduced the problem to. Around 1 hour 12 minute mark I've realized that this sum can be computed by taking 200 relatively small matrices to a large power modulo 10007, something we can do relatively quickly. I've finished the implementation and submitted around 1 hour 22 minute mark - only to learn that the solution is still too slow. Some more thinking led me to the idea of computing all 200 numbers I need using just one matrix power, but the matrix unfortunately became bigger, and thus the solution was still about 50% slower than needed. 1.5x was already in the 'non-asymptotic' speedup area, so I've switched to speeding up matrix multiplication, first by using the modulo operation less often, and then by taking advantage of the fact that the matrix had corners of zeroes that didn't need to be recomputed. That has finally enabled me to make a successful submission at 1 hour and 49 minutes into the contest, with too little time left to solve any other problem.

In the meantime, Ilya has solved the remaining four problems and claimed the victory - congratulations!

And finally, let me finish this post with something different: a chess puzzle! More specifically, I want to share a wonderful gem of a chess website, lichess, that provides a training area with puzzles like this one: http://en.lichess.org/training/23280.

Their idea is that since many people are playing chess against each other on their website, they can simply find "interesting" situations in those games automatically - probably when computer could see a unique sequence of good moves in a position that's losing otherwise - and present those situations as puzzles that make people learn real-world chess tactics, not just apply abstract backtracking skills. Quite fittingly, all puzzles are asking you to find the best move for Black/White, not to find a mate in three or something like that, since the former is what actually matters in real chess games. In particular, the puzzle referenced above was taken from this blitz game, where the real black player could not find the right sequence of moves and lost.

The puzzles also have the usual up/down voting buttons, so as hundreds and thousands of people are going through them, the system can become even more intelligent and suggest only quite interesting puzzles to most. The puzzle linked above and shown on the left was presented to me recently, and I've enjoyed solving it - did you?

Also, I can't help but wonder if we can come up with similar training approach for algorthmic programming based on the enourmous base of contest solutions that we've accumulated over the years.

Thanks for reading, and check back next week!