Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Search Techniques for Fourier-Based Learning

Adam Drake and Dan Ventura Computer Science Department Brigham Young University {acd2,ventura}@cs.byu.edu

Abstract
Fourier-based learning algorithms rely on being able to efciently nd the large coefcients of a functions spectral representation. In this paper, we introduce and analyze techniques for nding large coefcients. We show how a previously introduced search technique can be generalized from the Boolean case to the real-valued case, and we apply it in branch-and-bound and beam search algorithms that have signicant advantages over the best-rst algorithm in which the technique was originally introduced.

Fourier-Based Learning

The algorithms discussed in this paper are based on the Fourier transform of Boolean-input functions. It is also known as a Walsh transform.

2.1

The Fourier Transform

Suppose f is a real-valued function of n Boolean inputs (i.e., f : {0, 1}n R). Then the Fourier spectrum of f , denoted , is given by f () = 1 f 2n f (x) (x)
x{0,1}n

(1)

Introduction

() is the spectral coefcient of where {0, 1}n and f basis function . Each : {0, 1}n {1, 1} is dened by the following: (x) = 1 : if 1 : if i xi is even i i xi is odd
i

Fourier-based learning algorithms attempt to learn a function by approximating its Fourier representation. They have been used extensively in learning theory, where properties of the Fourier transform have made it possible to prove many learnability results [Jackson, 1997; Kushilevitz and Mansour, 1993; Linial et al., 1993]. Fourier-based algorithms have also been effectively applied in real-world settings [Drake and Ventura, 2005; Kargupta et al., 1999; Mansour and Sahar, 2000]. In order to approximate a functions Fourier representation, a learning algorithm must be able to efciently identify the large coefcients of the spectrum. Thus, Fourier-based learning is essentially a search problem, as the effectiveness of this learning approach is tied to the search for large coefcients. In this paper, we consider the problem of nding large spectral coefcients in real-world settings. Specically, we consider the scenario in which a learner is asked to learn an unknown function from a set of labeled examples. In the following sections, we briey discuss the hardness of the search problem, and describe a coefcient search technique that can be incorporated into a variety of search algorithms. We show that a complete branch-and-bound algorithm is as fast as and more memory efcient than a previously introduced algorithm. We also introduce an incomplete beam search algorithm that is always fast and usually able to nd solutions that are as good as those found by the complete algorithms.

(2)

where i and xi denote the ith binary digits of and x. The 2n basis functions of the Fourier transform are XOR functions, each computing the XOR of a different subset of inputs. The subset is determined by , as only inputs for which i = 1 contribute to the output of . The order of a basis function is the number of inputs for which i = 1. The Fourier coefcients provide global information about () provides a measure of the correlaf . For example, each f tion between f and . A large positive or negative value indicates a strong positive or negative correlation, while a small value indicates little or no correlation. Every f can be expressed as a linear combination of the basis functions, and the Fourier coefcients provide the correct linear combination: () (x) (3) f f (x) =
{0,1}n

Equation 3 is the inverse transform, and it shows how any f can be recovered from its Fourier representation.

2.2

Learning Fourier Representations

In typical learning scenarios, f , the target function, is unknown and must be learned from a set X of x, f (x) examples of the function. A Fourier-based learning algorithm

1040

attempts to learn a linear combination of a subset B of the basis functions that is a good approximation of f : f (x)
B

f X () (x)

(4)

based on ANDs and ORs of input features [Drake and Ventura, 2005]. The search algorithms presented in this paper can be used with other representations by modifying a procedure described in the following section.

Here, f X () denotes a coefcient approximated from X . Since f is only partially known, the true values of the coefcients cannot be known with certainty. However, they can be approximated from the set of examples: 1 f X () = |X | f (x) (x)
x,f (x) X

Finding Spectral Coefcients

(5)

For a spectral learning algorithm that selects basis functions with large coefcients, the heart of the learning algorithm is the search algorithm used to nd large coefcients. In a boosting approach, the key to success is being able to nd one large coefcient (per iteration).

3.1

Finding Spectral Coefcients is Hard

It can be shown that under certain conditions coefcients approximated in this way will not differ much from the true coefcients [Linial et al., 1993]. Since there are an exponential number of basis functions (2n basis functions for an n-input function), it is not practical to use all basis functions unless n is small. Consequently, a key difference between Fourier-based algorithms is in the choice of which basis functions will be used. (The other basis functions are implicitly assumed to have coefcients of 0.) One approach is to use all basis functions whose order is less than or equal to some value k [Linial et al., 1993]. This approach has the advantage that no search for basis functions is necessary. However, it has the disadvantage that the class of functions that can be learned is limited to those that can be expressed with only low-order basis functions. Another approach is to search for and use any basis functions whose coefcients are large (e.g., larger than some threshold ) [Kushilevitz and Mansour, 1993; Mansour and Sahar, 2000]. Basis functions with large coefcients carry most of the information about a function, and many functions can be approximated well by only the large coefcients of their spectral representations. A third approach uses boosting [Jackson, 1997]. In this approach, basis functions are added iteratively. Each iteration, a basis function is selected that is highly correlated with a weighted version of the data. Initially, all examples have equal weight, so the rst basis function added is correlated with the original data. Thereafter, examples that are classied incorrectly by the previously added basis functions receive more weight, so that basis functions added in subsequent iterations are more correlated with misclassied examples. The search algorithms described in the following sections are presented with a boosting approach in mind, and each algorithm returns the largest coefcient found during search. However, with simple modication, they can be used to nd any desired number of large coefcients, allowing them to be used with the all-large-coefcients Fourier learning approach.

As mentioned previously, the number of basis functions is exponential in the number of inputs to the function being learned. A simple brute force calculation of spectral coefcients to nd the largest would require O(|X | 2n ) time. (O(|X |) time is required to compute a single coefcient, and there are 2n coefcients to consider.) Meanwhile, the Fast Walsh Transform algorithm (a Boolean-input analogue to the Fast Fourier Transform) requires O(n2n ) time. Both of these approaches are practical only for small learning problems. Most previous work in spectral learning has either been in small domains (where a brute-force approach is feasible), made assumptions about which coefcients might be large (which limits applicability), or relied on conditions that are not practical in many real-world situations (such as requiring an oracle that can be queried for examples during training). This paper considers algorithms for the more general learning scenario in which brute force computation may be infeasible, no assumptions about the coefcients can be made, and the set of training examples may be xed prior to learning. Unfortunately, it can be shown that the problem of nding a large spectral coefcient with respect to a set of examples is as hard as solving the MAX-2-SAT problem, and is therefore an NP-complete problem. The following theorem expresses this idea for the Fourier spectrum. Theorem 1. Let X be a set of x, f (x) pairs, where x {0, 1}n and f (x) R. Given R, the problem of de termining whether {0, 1}n such that |f X ()| is NP-complete. Proof Sketch. The proof is by reduction from MAX-2-SAT. The key observation in the proof is that every CNF expression can be converted in polynomial time to a set X of x, f (x) pairs such that each coefcient of f X gives the number of clauses that will be satised by one of the possible truth assignments. Thus, solving the large Fourier coefcient problem for X would give the MAX-2-SAT solution for the CNF expression. This coefcient search result generalizes to a class of spectral representations with certain properties.

2.3

Generalized Fourier-based Learning

A Fourier-based learning algorithm will likely be most effective when f has, or can be approximated well by, a sparse Fourier representation. Of course, not all functions can be approximated efciently by a sparse Fourier representation. However, the Fourier learning approach can be generalized to allow other representations, such as wavelet representations [Donoho and Johnstone, 1994; 1995] or representations

3.2

Searching by Bounding Coefcient Size

Fortunately, although the NP-completeness result suggests that an efcient algorithm for nding large coefcients in arbitrary spectra does not exist, a technique for bounding coefcient size in any given region of the spectrum makes it

1041

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)

CreateChild(parentN ode, i, value) child. parentN ode. child.i value child.X for each x, f (x) parentN ode.X vx f (v ) f (x) if vi = 1 vi 0 if value = 1 f (v ) f (v ) if z, f (z ) child.X such that v = z z, f (z ) z, f (z ) + f (v ) else child.X child.X v, f (v ) return child

Figure 1: Procedure for obtaining children of a Fourier coefcient search node. i is the digit of the parent nodes label that is to be set, and value is the value it is to be set to. possible to nd large coefcients much more efciently than with the previously mentioned approaches. A Boolean function version of this technique was used in the context of a best-rst search algorithm [Drake and Ventura, 2005]. Here we generalize the technique to handle real-valued functions and show that it can be incorporated into search algorithms that are capable of handling larger problems. In this section we will consider only the case of searching the Fourier (XOR) spectrum. With some modication this technique can be used to nd coefcients in other spectra. To explain the technique we will introduce some additional notation. Let {0, 1, }n , where is a wildcard value, be a partially or fully dened basis function label that represents a region of the Fourier spectrum. Specically, represents the region of the spectrum consisting of all such that i (i = i = i ). We will use the notation to denote an in region . Suppose that the coefcient search space is a tree of nodes with one node for each possible . The search begins at the root node, which has label = n . As the search proceeds, nodes are expanded by replacing them with child nodes whose labels have one of the wildcards set to either 0 or 1. Both children set the same wildcard, with one child taking the value 0 and the other taking the value 1. A leaf node, which will have no wildcards in its label, is a solution, and its label corresponds to the label of a specic basis function. In addition to the label , there is a set of examples, X , associated with each node that facilitates the computation of coefcient bounds by implicitly storing in child nodes information obtained while computing bounds for parent nodes. For the root node, X = X . For any other node, X is derived from its parent by the procedure shown in Figure 1. Using this procedure, the size of the largest possible Fourier coefcient in region can be bounded as follows: max f X ()
x,f (x) X

largest possible Fourier coefcient for a given X is given by x,f (x) X |f (x)| max f X () n |X | which is the coefcient bound of the root node. This maximum correlation exists only if there is an such that either sign(f (x)) = sign( (x)) or sign(f (x)) = sign( (x)) for all x, f (x) X . To the extent that a basis function, or set of basis functions, does not exhibit either of these correlations, coefcient size drops. The procedure described in Figure 1 captures this idea at line 12, where examples are merged. Examples are merged here only if they are identical on inputs for which i has not yet been set. This can be done because the effect of inputs for which i has been set has already been taken into account (by inverting outputs, at line 10, whenever i = vi = 1). When examples are merged, the coefcient bound may decrease since |f (x) + f (v )| < |f (x)| + |f (v )| when sign(f (x)) = sign(f (v )). Note that for a solution node, where {0, 1}n , all examples will have merged into one, and the coefcient of is given by x,f (x) X f (x) f X ( ) = |X | while the magnitude of the coefcient is given by the absolute value of that quantity. The following sections describe search algorithms that all use this coefcient bounding technique but explore the search space in different ways. Two of the methods are complete, while the other is an incomplete search technique.

Complete Search Techniques

The complete search algorithms presented here always nd the largest coefcients, but may require exponential time and/or memory to do so. The rst is a previously introduced best-rst search algorithm [Drake and Ventura, 2005], and the second is a branch-and-bound search algorithm. Empirical results show that the branch-and-bound algorithm can nd the largest coefcient in about the same amount of time as the best-rst algorithm, while requiring far less memory. With a boosting approach in mind, both algorithms are presented here as if the task were to nd the single largest coefcient. With simple modication, the algorithms can be used to nd the r largest coefcients or to nd all coefcients larger than a threshold, which would be done in the nd-alllarge-coefcients approach.

4.1

Best-First Search

|f (x)|

|X |

(6)

To give some intuition behind this technique, note that the

The best-rst search algorithm is outlined in Figure 2. Like the other search algorithms, it begins at the root node. After expanding the rst node, however, it explores the space in order of max |f X ()|, where is the nodes label. Nodes are stored in a priority queue in which the highest priority element is the node with the largest coefcient bound. Since nodes are visited in best-rst order, relatively few nodes are visited unnecessarily. However, the entire frontier of the search must be stored in memory, which can exhaust resources fairly quickly if the search becomes large.

1042

FindLargeCoef-BestFirst(X ) initialN ode. n initialN ode.X X priorityQueue.insert(initialN ode) while priorityQueue.front() is not a solution node priorityQueue.removeFront() i GetInputToSetNext(node) priorityQueue.insert(CreateChild(node, i, 0)) priorityQueue.insert(CreateChild(node, i, 1)) return priorityQueue.front().

Table 1: Data set summary.


Data Set Chess German Heart Pima SPECT Voting WBC1 WBC2 WBC3 Inputs 38 24 20 8 22 16 9 32 30 Examples 3,196 1,000 270 768 267 435 699 198 569

Figure 2: The best-rst search algorithm. Nodes are stored in a priority queue that always places the node with the largest coefcient bound at the front.
FindLargeCoef-BranchAndBound(X ) initialN ode. n initialN ode.X X stack.push(initialN ode) while stack is not empty node stack.pop() if maxnode. |f X ()| > |fX (best )| if node is a solution best node. else i GetInputToSetNext(node) stack.push(CreateChild(node, i, 1)) stack.push(CreateChild(node, i, 0)) return best

either algorithm, so inputs could be processed in an arbitrary order. However, both algorithms benet greatly from a dynamic variable ordering scheme. The variable ordering heuristic used in the experiments that follow selects an input according to the following: argmin
i i0

max f X () + max fX ()
i1

(7)

in which i0 and i1 denote the labels of the child nodes that result from setting i to 0 and 1, respectively. This heuristic chooses the input that results in the tightest (i.e., smallest) combined coefcient bounds in the children. By obtaining tighter bounds more quickly, it is possible to more quickly determine which portions of the space can be ignored.

Figure 3: The branch-and-bound search algorithm. Nodes are visited depth-rst, and nodes whose coefcient bound is below the largest coefcient found so far (|f X (best )|) are pruned from the search.

4.4

Complete Search Results

4.2

Branch-and-Bound Search

The branch-and-bound search algorithm is outlined in Figure 3. It is a depth-rst search in which search paths are pruned whenever a nodes coefcient bound (max |f X ()|) is be ( ) | ). Figure 3 illow the best solution found so far (|f X best lustrates the use of a stack to perform this search, although it can be implemented using recursion as well. The branch-and-bound algorithm will tend to visit more nodes than the best-rst algorithm, but it has less overhead, so it can visit more nodes per second. In addition, its memory usage is linear in n, the number of inputs. If examples at each node are stored in a hash table of size h, then the algorithms space complexity is O(n(|X | + h)). By comparison, the space complexity of the best-rst algorithm is O(m(|X | + h)), where m, n m 2n , is the number of nodes expanded during search. (For every node expanded and removed from the queue, two are added, so the queue size increases by one each time a node is expanded.)

4.3

Variable Ordering

In both algorithms, when a node is expanded, an input i for which i = is selected to be set to 0 and 1 in the child nodes (as indicated by the GetInputToSetNext function in Figures 2 and 3). The choice of i does not affect the completeness of

This section compares the performance of the algorithms on several data sets [Newman et al., 1998], which are summarized in Table 1. Each data set represents a Boolean classication problem. In each data set, non-Boolean input features were converted into Boolean features. Each numeric input feature was converted into a single Boolean feature that indicated whether the value was above or below a threshold. Each nominal input feature was converted into a group of m Boolean features (one for each possible value of the feature), where only the Boolean feature corresponding to the correct nominal value would be true in an example. The number of inputs listed in Table 1 is the number of inputs after converting non-Boolean inputs to Boolean. Tables 2-4 compare the performance of the best-rst and branch-and-bound algorithms, both with and without the variable ordering heuristic, when used to nd the largest coefcient for each of the data sets. Tables 2, 3, and 4 show the average number of nodes expanded, the average amount of time required, and the average memory usage, respectively. (Note that for two of the data sets the best-rst algorithm without the variable ordering heuristic ran out of memory, so results are not displayed in those two cases.) As the node counts in Table 2 show, the variable ordering heuristic drastically reduces the number of nodes visited by both algorithms on the larger search problems. Table 3 shows that this improvement in the number of visited nodes leads to smaller run times as well, in spite of the added computation required to compute the heuristic. Comparing the best-rst and branch-and-bound algorithms, Table 2 shows that the best-rst algorithm typically

1043

Table 2: Average number of nodes visited while searching for the largest Fourier coefcient. The variable ordering heuristic (H) reduces the number of nodes visited by both algorithms, while the best-rst (BFS) algorithm usually visits fewer nodes than the branch-and-bound (B&B) algorithm.
Data Set Chess German Heart Pima SPECT Voting WBC1 WBC2 WBC3 B&B 432,986.1 4,097.8 4,072.8 33.4 8,007.9 52.2 24.8 252,833.5 6,936.2 BFS 3,912.2 3,935.4 20.0 8,040.8 28.3 21.6 6,500.7 B&B+H 2,012.5 155.0 196.4 33.8 1,170.2 26.0 17.0 479.3 106.1 BFS+H 296.0 135.5 184.4 13.3 1,170.7 26.0 17.0 339.3 101.8

Table 4: Average memory usage (in terms of the number of nodes) during a search for the largest Fourier coefcient. The branch-and-bound (B&B) algorithms memory usage is proportional to the number of inputs, and tends to be much less than that of the best-rst algorithm (BFS), whose memory usage is proportional to the number of visited nodes.
Data Set Chess German Heart Pima SPECT Voting WBC1 WBC2 WBC3 B&B/B&B+H 39 25 21 9 23 17 10 33 31 BFS 3,913.2 3,936.4 21.0 8,041.8 29.3 22.6 6,501.7 BFS+H 297.0 136.5 185.4 14.3 1,170.7 27.0 18.0 340.3 102.8

Table 3: Average time (in seconds) to nd the largest Fourier coefcient. Although the best-rst algorithm (BFS) usually visits fewer nodes, its run time is roughly equivalent to that of the branch-and-bound (B&B) algorithm.
Data Set Chess German Heart Pima SPECT Voting WBC1 WBC2 WBC3 B&B 242.16 1.11 0.40 0.00 1.11 0.00 0.00 23.36 1.48 BFS 1.94 0.58 0.00 2.08 0.00 0.00 2.55 B&B+H 1.39 0.11 0.04 0.00 0.42 0.00 0.00 0.11 0.05 BFS+H 0.91 0.12 0.05 0.00 0.46 0.00 0.00 0.12 0.05

FindLargeCoef-BeamSearch(X , k) initialN ode. n initialN ode.X X bestN odes initialN ode for j 1 to n priorityQueue for each node bestN odes i GetInputToSetNext(node) priorityQueue.insert(CreateChild(node, i, 0)) priorityQueue.insert(CreateChild(node, i, 1)) bestN odes while bestN odes.size() k node priorityQueue.removeF ront() bestN odes bestN odes node return bestN odes.best().

visits fewer nodes. However, Table 3 shows that the algorithms require about the same amount of time to nd a solution, so the additional overhead of the best-rst algorithm appears to offset the potential gain of visiting fewer nodes. Table 4 demonstrates the branch-and-bound algorithms memory advantage. Perhaps more important than the differences observed here, however, is the fact that the branch-and-bound algorithms worst-case memory complexity is linear in n, while the best-rst search algorithms is exponential.

Figure 4: The beam search algorithm. The search space is explored breadth-rst, with all but the k most promising nodes at each level being pruned.

O(k (|X | + h)), where, as before, h is the size of the hash table containing the examples in a node. Like the best-rst and branch-and-bound algorithms, the beam search algorithm can use an arbitrary variable ordering, but it can benet from a good heuristic. Here, the motivation for the heuristic is different. In the case of the compete algorithms, the heuristic is used to reduce the number of nodes in the search space that need to be visited and has no effect on the solution. For the beam search, however, the number of nodes that will be considered is xed, and the heuristic is used to improve the solution. Without a heuristic, it is possible for early decisions to be made on inputs for which there is little information, meaning that partially-dened labels that lead to good solutions might get pruned before the inputs that would reveal their usefulness were considered. The variable selection heuristic dened previously (Equation 7) favors inputs that tighten the coefcient bounds in the child nodes, making it less likely for good nodes to be pruned.

Incomplete Search Techniques

An alternative to the previously introduced complete search algorithms is an incomplete algorithm that may not always nd the largest coefcients but is guaranteed to nish its search quickly and usually nds good solutions.

5.1

Beam Search

The beamed breadth-rst search described in Figure 4 explores the search space in a breadth-rst manner, but at each level of the tree it prunes all but the best k nodes, ensuring that the number of nodes under consideration stays tractable. The number of nodes that will be considered by the beam search algorithm is O(nk ), where k is the width of the beam. Thus, unlike the complete algorithms, its worst-case time complexity is not exponential in n. Its space complexity is

1044

Table 5: Learning accuracy when using a single basis function obtained by a beam search of the given width. A bold highlight indicates a result that is not signicantly worse (statistically) than the innite-beam result. In most cases, a relatively small beam is sufcient to match the accuracy obtained with an innitely large beam.
Data Set Chess German Heart Pima SPECT Voting WBC1 WBC2 WBC3 1 62.8% 58.5% 53.3% 67.2% 59.5% 82.8% 83.7% 55.4% 80.8% 2 73.7% 68.8% 59.8% 73.1% 61.8% 96.0% 89.7% 54.4% 88.9% Beam Width 4 8 74.7% 75.0% 71.1% 71.9% 73.7% 74.7% 73.6% 73.8% 75.0% 77.6% 96.0% 96.0% 91.7% 91.2% 60.2% 61.0% 88.9% 89.0% 75.1% 72.7% 72.9% 73.7% 77.6% 96.0% 91.3% 72.2% 87.6%

Table 6: Average training time (in seconds) of the spectral learning algorithm when using the branch-and-bound and beam search algorithms to nd coefcients. On the larger problems (shown here), the beam search is much faster, and its solutions (see Table 5) are usually equally good. (On the smaller problems the difference in time is negligible.)
Data Set Chess German Heart SPECT WBC2 B&B+H 1.49 0.12 0.04 0.48 0.13 Beam (width = 8) 0.23 0.04 0.01 0.02 0.02

5.2

Incomplete Search Results

same amount of time while using less memory. Meanwhile, experiments with an incomplete beam search algorithm demonstrate that even using a small beam width it is possible to nd solutions that result in learning accuracy comparable to that obtained by a complete algorithm.

Table 5 shows the result of attempting to learn functions with a single basis function that is found by a beam search. The accuracies shown are the average test accuracies (over 100 trials) when training and testing on random 90%/10% training/test splits of the data. A bold highlight indicates a result that is not signicantly worse than the result obtained with a complete search, as measured by a paired random permutation test (p = 0.05). As the table shows, the beam does not usually need to be very wide before the beam search does about as well as a complete search. In fact, sometimes the beam search performs better. Although we expect that basis functions with larger coefcients will usually be better models of the unknown function f , a larger coefcient only implies greater correlation with X , and not necessarily with f . This uncertainty is advantageous to the beam search, as its solutions, which may be sub-optimal with respect to X , may often be as good for the learning task as the solutions returned by a complete search. Table 6 shows that the beam search can nd its solutions in less time than the branch-and-bound algorithm. The computational advantage of the beam search will be more important when spectral techniques are applied to higher dimensional problems involving hundreds or thousands of input features. Preliminary experiments with natural language processing tasks involving thousands of input features suggest that the beam search approach can still be used to nd good solutions long after it has become infeasible to compute exact solutions.

References
[Donoho and Johnstone, 1994] D. Donoho and I. Johnstone. Ideal Spatial Adaptation by Wavelet Shrinkage. Biometrika, 1994. [Donoho and Johnstone, 1995] D. Donoho and I. Johnstone. Adapting to Unknown Smoothness via Wavelet Shrinkage. Journal of the American Statistical Association, 1995. [Drake and Ventura, 2005] A. Drake and D. Ventura. A practical generalization of Fourier-based learning. In Proceedings of the 22nd International Conference on Machine Learning, pages 185192, 2005. [Jackson, 1997] J. Jackson. An efcient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences, 55:414440, 1997. [Kargupta et al., 1999] H. Kargupta, B. Park, D. Hershbereger, and E. Johnson. Collective data mining: A new perspective toward distributed data mining. In Advances in Distributed Data Mining. AAAI/MIT Press, 1999. [Kushilevitz and Mansour, 1993] E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum. SIAM Journal on Computing, 22(6):13311348, 1993. [Linial et al., 1993] N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, Fourier transform, and learnability. Journal of the ACM, 40(3):607620, 1993. [Mansour and Sahar, 2000] Y. Mansour and S. Sahar. Implementation issues in the Fourier transform algorithm. Machine Learning, 14:533, 2000. [Newman et al., 1998] D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz. UCI repository of machine learning databases, 1998.

Conclusion

In this paper we have considered the problem of nding large coefcients in the Fourier spectrum, which is a central task in Fourier-based learning. We have described a technique for efciently bounding Fourier spectra that can be incorporated into different types of search algorithms. Empirical results show that a complete branch-and-bound algorithm based on the technique outperforms a previously introduced best-rst algorithm by nding solutions in the

1045

You might also like