Heuristic vs Exact Optimization

Some context

My Master is in Data Science, and we are allowed to specialize in different areas of this field. I basically specialized in (1) general Machine Learning, and (2) Optimization.

The two most important courses I took related to Optimization were:

On each of them, we had to work on a big project in which we focused on a specific problem, and we tried to solve it by applying exact or heuristic optimization, respectively.

My take on heuristic vs exact optimization algorithms

If you had asked me some time ago, I would have said that exact optimization is the only right way, and that providing solutions without optimality guarantees is for losers.

Naturally, I still find more mathematical beauty in exact approaches. But my opinion on this topic has changed because of two ideas:

Heuristic and exact opti are not really two separate worlds. Exact optimization solvers critically rely on heuristic methods to find heuristic solutions. Moreover, it is a common practice for an OR practitioner to first find a feasible solution heuristically, and then feed it to an exact solver to improve it and prove optimality. On the other hand, heuristic optimization should never forget about the strengths of exact solvers. I would argue that even if you plan to solve a problem heuristically, having a MIP formulation and trying to solve the small instances with a solver is the right first step. Heuristic methods should then help you find solutions to these instances that are intractable for the solver. Moreover, the root node heuristics of exact solvers are incredibly strong, and you can use their provided solution to further improve it using hand-crafted heuristics.

Optimality, if not easy to find, is useless in practical applications. I attended a talk given by Professor Ruben Ruiz (AWS) at the EURO Conference 2024. ChatGPT summarizes his talk like this:

In OR research and academia, there is a strong focus on developing complex, problem-specific algorithms to achieve near-optimal solutions, often adding complexity to achieve marginal gains in optimality gaps. However, these intricate methods can be impractical for industrial problems that require flexibility, maintainability, and quick adaptability. In real-world scenarios with large datasets and approximated inputs, a slight loss in optimality is acceptable in exchange for more robust and pragmatic solutions.

For example, assume the fictional papers:

  1. A QUBO - Benders Decomposition with a Branch-and-Cut and a neural QUBO solver for the Bin Packing problem → their algorithm only provided decent optimality proofs for instances of size < 100 and just doesn’t work for instances > 1000
  2. A simple GRASP heuristic for large Vector bin Packing with Heterogeneous Bins problems → they can quickly provide solutions for instances of size 1M

The point of the talk is that paper (1) is regarded as higher compared to (2). However, its practical applications are close to zero because there are no Bin Packing problems in the real world, and the industry problems are larger in size. So basically, academia is solving the problems that we don’t have to optimally, instead of solving the real-world problems heuristically.

Complete abstract of Ruben Ruiz's talk at EURO2024

Title: Optimal is the enemy of good: Solving very large scale virtual machine placement problems at Amazon Elastic Compute Cloud

Abstract: A significant portion of the effort within the field of Operations Research is dedicated to the development of intricate ad-hoc algorithms and models tailored for addressing diverse optimization problems. Frequently, the research community exhibits a preference for complexity in the proposed methodologies. Notably, an esteemed characteristic sought in these methods is the incorporation of problem-specific knowledge, enabling the attainment of results that get as close as possible to optimality.

In academic publishing, the pursuit of marginal gains in optimality gaps is a common goal, even at the expense of introducing additional complexity to models or algorithms. While there is consensus that such endeavors contribute to advancements in algorithms and propel the field of Operations Research forward, a frequently underestimated dimension is the practical applicability of these advancements in industrial settings.

Real industrial problems are often messy with imprecise (or numerous) objectives, soft constraints, preferences and a myriad of situations that demand pragmatic approaches. Additionally, these problems undergo rapid evolution with frequent model refinements, occurring often fortnightly.

Here, the ongoing painful adaptation of intricate code bases stemming from ad-hoc methods is not the favored choice. In contrast, solvers offer greater flexibility, allowing faster implementation of new constraints through their APIs than altering tailored algorithms. In industrial contexts, the preference lies in flexibility, maintainability, robustness, and reduced operational load, where a modest optimality gap is deemed a minor trade-off.

Moreover, in the realm of large-scale real-world problems, mathematical solvers often prove impractical. It is commonplace to resort to simplifying the problem for solvability. This practice raises a critical question: Is it logical to insist on chasing optimality in solving a simplified problem with a reduced real-world fidelity? Furthermore, real problems entail vast datasets, often including forecasted or approximated input data. Does it make sense to go great lengths to optimally solve a problem when a portion of the input data involves approximations?

In this presentation, I advocate for relinquishing the pursuit of optimality and, instead, embracing heuristic solvers and simplified modeling approaches. Such strategies yield expedient, adaptable, maintainable, and easily implementable models. The discourse will draw upon various examples, spanning classical routing and scheduling problems, culminating in intricate real-world virtual machine placement bin packing problems encountered at Amazon Elastic Compute Cloud (EC2).

Some question I have:

What is the best framework to build a fully heuristic solver around? An interesting question is whether Branch and Bound / Cut (without much focus on the dual side) would be the best approach.

MIP optimization for the k-Minimum Spanning Tree problem

You can take a look at the repository for this project here: jsalvasoler/K-MST-with-MILP-optimization. In the end, this is just a classic modeling exercise: come up with different MIP models and compare how a solver performs on them. But I enjoyed it. 🙂

In this project, I provided different MILP formulations for the k-Minimum Spanning Tree Problem. In particular, we define:

  • a sequential formulation (inspired by Miller, Tucker and Zemlin, MTZ)
  • a Single-Commodity Flow formulation, SCF
  • a Multi-Commodity Flow formulation, MCF
  • a Cycle Elimination Constraints formulation, CEC
  • a Directed Cutset Constraints formulation, DCC

I used ortools and gurobipy for the implementation, and I compared the formulations on a set of 10 graph instances. I compared the licensed solver Gurobi 10.0.1 vs the open-source SCIP v803 for the compact models, and only Gurobi was used for the exponential models.

Here is an example of the CEC formulation, which turned out to be best together with MTZ.

This model uses binary variables $x_e, e \in E$, and $z_i, i \in V$ that select the edges and nodes part of the MST. We use the notation $E(S) = \lbrace e={i, j} \mid i,j \in S \rbrace$ and $\delta(i)$ for the neighboring nodes of $i\in V$.

\begin{equation} \min\ \sum_{e \in E} w_{e} x_{e} \end{equation} \begin{equation} \text{s.t. } \sum_{e \in E} x_{e} = k - 1 \end{equation} \begin{equation} \sum_{i \in V} z_{i} = k \end{equation} \begin{equation} \sum_{e \in E(S)} x_{e} \leq |S| - 1 \quad \forall S \subseteq V:\ 3 \leq |S| \leq k \end{equation} \begin{equation} \sum_{e \in \delta(i)} x_{e} \leq M \cdot z_{i} \quad \forall i \in V \end{equation} \begin{equation} z_{i} \leq \sum_{e \in \delta(i)} x_{e} \quad \forall i \in V \end{equation} \begin{equation} x_{e} \in {0, 1} \quad \forall e \in E \end{equation} \begin{equation} z_{i} \in {0, 1} \quad \forall i \in V \end{equation}

Our findings were that MTZ and CEC were the best performing formulations. A faster implementation of the separation routing of CEC (for instance, using C++ instead of Python for finding the Max Flow between nodes) would have probably made this model the best.

Heuristic methods for the Weighted S-Plex Editing problem

In this project I worked with Grégoire de Lambertye, and the goal was to implement many heuristic and meta-heuristics for the s-Plex Editing Problem. Here is the repository for the project: GregoireLamb/heuristic-optimization-splex

Formal problem description (a bit long)

We consider the Weighted s-Plex Editing Problem (WsPEP), which is a generalization of the s-Plex Editing Problem which in itself is a generalization of the Graph Editing Problem. We are given an undirected graph $G=(V,E)$, a positive integer value $s$, and a symmetric weight matrix $W$ which assigns every (unordered) vertex pair $(i,j): i,j \in V, i \neq j$ a non-negative integer weight $w_{ij}$. An s-Plex of a graph is a subset of vertices $S \subseteq V$ such that each vertex has degree at least $|S| - s$ and there exist no edges to vertices outside the s-Plex, i.e., $i,j \in V, i \in S, j \notin S \implies (i,j) \notin E$. Note that a clique (complete graph on a vertex subset) is a 1-plex. The goal is to edit the edges of the graph by deleting existing edges and/or inserting new edges such that the edited graph consists only of non-overlapping s-Plexes and such that the sum over all weights of the edited edges is minimal. Let a candidate solution be represented by variables $x_{ij} \in \{0,1\}, i,j \in V, i < j$, where a value 1 indicates that edge $(i,j)$ is either inserted (if $(i,j) \notin E$) or deleted (if $(i,j) \in E$) and a value 0 indicates that the edge is not edited. The objective function is then given by $\min f(x) = \sum_{(i,j) \in E} w_{ij} x_{ij}$.</p>

In essence, the problem is about adding or removing edges from a graph such that the resulting graph has all its connected components having the s-Plex property. The edges are weighted, so you want to add or remove them with the minimum cost.

We implemented the following methods:

  • Construction Heuristics: Greedy and Randomized Greedy construction heuristics
  • GRASP: Greedy Randomized Adaptive Search Procedure on top of the Randomized Greedy construction heuristic
  • Local Search: Iterated Local Search with the construction heuristic as the initial solution. The neighborhood structures considered are:
    • Swap nodes: Swap two nodes in the solution
    • k-flips: Remove k edges from the solution
    • Move nodes: Move nodes from one s-Plex to another
  • VND: Variable Neighborhood Descent
  • Simulated Annealing: Simulated Annealing with the construction heuristic as the initial solution.
  • BRKGA: Biased Random Key Genetic Algorithm
  • ACO: Ant Colony Optimization

Moreover, we performed:

  • Statistical testing to fairly identify the best methods on a set of instances
  • Hyperparameter Tuning of some of the methods using SMAC.

Our conclusions and lessons learned

  • We had a lot of fun, working on well-defined optimization problems is so nice.
  • Gregoire and I work very well together.
  • There isn’t a “unified” heuristic algorithmic framework.
  • There isn’t a language to model problems that are going to be solved heuristically.
  • There isn’t a very well established package or library for heuristic methods.
  • In the end, heuristics are very problem-specific, and researchers are okay with implementing them from scratch. I find this a bit annoying.
  • SMAC is powerful, but it is so badly supported on Windows that it is unusable (as of July 2024).
  • Heuristic methods are so dependent on algorithmic design choices. We made some questionable choices at the beginning of the project (how to represent solutions, the neighborhood structures), and this limited the performance of all the subsequent methods. Our methods’ performances were nice, but not perfect. We still got the best grade.



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • A ViT-based Siamese Network for Visual Reasoning in the KiVA-ICCV Challenge (2nd-place solution!)
  • I defended my master's thesis on Diffusion-Based Evolutionary Algorithms
  • My fork of karpathy/nanoGPT
  • I took on the MESS 2024 Computational Optimization Challenge
  • Training a transformer to predict the gender of German nouns