Authors' Names

Patrick Kreitzberg

Presentation Type

Oral Presentation

Abstract/Artist Statement

If you are on a budget, how may you go about finding the best drink and entrée combination at a restaurant? You may simple choose the least expensive items, but a water and side salad is not a great dinner. Instead, you may want to judge the ten least expensive drink and entrée combinations to pick your favorite. If you create a list of drink prices and a list of entrée prices, then all possible combinations of a drink and an entrée would be the Cartesian product of the two lists. Then, you would want to choose from the ten least expensive meals produced by the Cartesian product.

Finding the smallest k values from the Cartesian product X+Y, where X and Y are lists of values X = {x1, x2,...}, Y = {y1, y2,...}, is a well-studied fundamental problem of computer science. There have been several methods which solve this problem with a runtime proportional to n + k, where n is the length of the lists. This is the best runtime possible since all input and output values much be touched at least once. The generalization of the problem, where the Cartesian product is on many lists X1+X2+···+ Xm, has never seen a fast algorithm. We present an algorithm for the generalization which is faster than m•n + k•m. This is remarkable because to load m lists, each with n values, has runtime m•n and looking up k values in m lists has runtime k•m.

In computer science, there are many different structures used to store data. In order to get a fast runtime, we use a new data structure called a "layer-ordered heap" which gives information about the ordering of the values in a list while still not completely sorting the data. It may seem intuitive to use sorting since we want to find the smallest values; however, sorting a list of k values has a runtime of at least k•log(k). In the runtime of our method, we want the term which grows with k to be faster than k•log(k) so we can not use sorting. Keeping the data organized in such a way that it has some ordering to it but is not completely sorted is the key to our algorithm.

One important application of our algorithm is to calculate the most abundant isotopes of a molecule. The isotopes of an element (e.g. oxygen) are all the ways in which an element may have a different number of neutrons. For example, carbon dioxide CO2 is made up of one carbon and two oxygens. Carbon has two isotopes which appear in nature, 12C and 13C, while oxygen has three, 16O, 17O, and 18O. This means that carbon and oxygen may naturally form six different combinations of isotopes, which is the Cartesian product of three lists: {12C, 13C}, {16O, 17O, 18O}, and {16O, 17O, 18O}. Six possible isotopes may seem trivial, but for very large molecules there may be millions of possible isotopes, being able to efficiently compute only the top k is very helpful.

Share

COinS
 
Feb 28th, 11:00 AM Feb 28th, 11:15 AM

Efficiently finding the smallest k values in a large Cartesian product of lists

UC 331

If you are on a budget, how may you go about finding the best drink and entrée combination at a restaurant? You may simple choose the least expensive items, but a water and side salad is not a great dinner. Instead, you may want to judge the ten least expensive drink and entrée combinations to pick your favorite. If you create a list of drink prices and a list of entrée prices, then all possible combinations of a drink and an entrée would be the Cartesian product of the two lists. Then, you would want to choose from the ten least expensive meals produced by the Cartesian product.

Finding the smallest k values from the Cartesian product X+Y, where X and Y are lists of values X = {x1, x2,...}, Y = {y1, y2,...}, is a well-studied fundamental problem of computer science. There have been several methods which solve this problem with a runtime proportional to n + k, where n is the length of the lists. This is the best runtime possible since all input and output values much be touched at least once. The generalization of the problem, where the Cartesian product is on many lists X1+X2+···+ Xm, has never seen a fast algorithm. We present an algorithm for the generalization which is faster than m•n + k•m. This is remarkable because to load m lists, each with n values, has runtime m•n and looking up k values in m lists has runtime k•m.

In computer science, there are many different structures used to store data. In order to get a fast runtime, we use a new data structure called a "layer-ordered heap" which gives information about the ordering of the values in a list while still not completely sorting the data. It may seem intuitive to use sorting since we want to find the smallest values; however, sorting a list of k values has a runtime of at least k•log(k). In the runtime of our method, we want the term which grows with k to be faster than k•log(k) so we can not use sorting. Keeping the data organized in such a way that it has some ordering to it but is not completely sorted is the key to our algorithm.

One important application of our algorithm is to calculate the most abundant isotopes of a molecule. The isotopes of an element (e.g. oxygen) are all the ways in which an element may have a different number of neutrons. For example, carbon dioxide CO2 is made up of one carbon and two oxygens. Carbon has two isotopes which appear in nature, 12C and 13C, while oxygen has three, 16O, 17O, and 18O. This means that carbon and oxygen may naturally form six different combinations of isotopes, which is the Cartesian product of three lists: {12C, 13C}, {16O, 17O, 18O}, and {16O, 17O, 18O}. Six possible isotopes may seem trivial, but for very large molecules there may be millions of possible isotopes, being able to efficiently compute only the top k is very helpful.