wiki:bfsvsdfs

Hey there. This post will cover two of the most important algorithms in computer science, **breadth-first search** and **depth-first search**. It will not, however, teach you the basics of programming. If you do not already know how to program, there are various links in this wiki that will take you to those kinds of resources. This assumes you already know how to program, and the language used will be C++. With that out of the way let's begin.

BFS and DFS are, as suggested by their names, searching algorithms. They allow you to find the shortest path or at least *a* path in a grid or graph-like structure. Although their approach to doing this is different, they fundamentally achieve the same goal and in most cases, at the same time complexity. Pathfinding in this context can also have different meanings. Algorithms like DFS can be used to solve puzzles, mazes, and other problems that require a brute force search through different possible paths of attack. As a plus side, they are also very easy to understand and implement as they take advantage of two powerful data structures; stacks and queues.

Let's start with DFS. Suppose we are on a grid. We start at the top left corner, and we want to reach the bottom left corner in a path that only goes down and right for simplicity. In this grid, there are obstacles placed in cells that cannot be passed through. As a final caveat, our eyesight is really bad and we can only see cells directly around us. With these constraints, how would we find a path?

Well, we can think of a very simple strategy. We continue moving forward in a single direction until we either got to the exit, hit a wall, or an obstacle. In the latter two, all we need to do is check the next possible direction (we only have 2, down and right). If *that* fails, we know that path is a dud. We can just backtrack our steps to the cell we last were, and repeat. This process of repeatedly traversing down a single "branch" of the structure and backtracking if it fails is the DFS algorithm.

You might be able to see why it's called **depth**-first search now. We are travelling down the virtual depth of this data structure, which happens to be a 2D grid. But this approach can work for many different problems. In a sudoku puzzle, we can try out a number, then the next, and the next, until we find out this number we tried does not agree with the rules of the game; in which case we can simply backtrack back to the last position in which it *did* work.

The DFS algorithm works with the stack data structure. If you don't know what a stack is, it's basically a growing and shrinking block of memory where items and pushed on top of each other and later popped off, like a physical stack of elements. Stacks are crucial to the algorithm as they are what allow the backtracking to work. At its simplest, this can be implemented using a recursive function call. This way, we use the program's stack (which is blazingly fast) as the stack for the process. Unfortunately, this can also be a major crutch. If the data set is very large, repeated recursive calls will quickly overflow the stack and cause your program to crash. Another major limitation with this algorithm is that unless every single possible path is tested, it does not guarantee the shortest path to the destination. Below is a sample recursive function for an H x W grid that could solve this problem.

```
//This function takes in a grid of size H x W,
//which has cells marked false if they have obstacles and the
//current position of the checking cell, and returns whether there is a path or not.
bool dfs_find_path(bool** grid, int H, int W, int X, int Y) {
if (X == H - 1 && Y == W - 1)
//If the cell is the goal cell, we return true and say there is a path.
return true;
if (X >= H || Y >= W)
//If the cell is out of the grid, we return false and say there is no path from here.
return false;
if (!grid[X][Y])
//If this cell is an obstacle, just like the first we return false with no path.
return false;
//If all checks are clear,
//we can then return whether either the right or down has a path.
//This way, if we start from cell (1, 1), the top left,
//we can ensure we know whether there is at least one path or not.
return dfs_find_path(grid, H, W, X + 1, Y) || dfs_find_path(grid, H, W, X, Y + 1);
}
```

New problem. Say we are again on a grid, starting at any arbitrary point (x, y). Our goal is to travel to a point (m, n), which resides somewhere on this grid. We do know where this point is until we have reached it. There are also again obstacles blocking our path, placed in this grid at some cells. To make matters worse, there is no longer a restriction on which direction you can move. Left, right, up, down. All valid. How can we find the *shortest* path to the destination cell?

Seems like quite the upgrade. This problem is indeed solvable using DFS if we make it try every single possible route, returning the shortest one it has found, but that's not very efficient. Not at all. Instead, we need a new strategy. Enter, breadth-first search. Breadth is essentially another word for width, and that can clue us in on what this does. Instead of travelling down the entire depth of a path when we search, we search in a sweeping like motion from our starting node. Also unlike DFS, if we stop once we found the destination BFS guarantees us the shortest path. This is also sometimes called a flood fill.

Glad you asked. While DFS uses a stack, BFS takes advantage of the queue. Instead of picking up the top of a stack of elements, you take from the elements that were placed first. Making it like a queue. At every cell we check, we first do our preliminary checks to see if it's out of the grid or the destination and all that, then if none of those passes, we throw the neighbouring nodes (or cells) onto the queue for later checking. It is also crucial we mark cells as "already checked" so that we don't recheck cells and keep going in loops. If you think about it conceptually, since we are throwing everything on a queue, we go out in a sweeping like motion. First, we check four paths. Then from those four paths, we check all the paths next to them. And so on. Because of this sweeping motion, at every moment in time, the current "level" of the queue we are checking is the same distance away from the starting node (in an unweighted graph). This means once the destination is found, we can safely assume that there is no faster way of reaching this point.

img src: https://www.codeabbey.com/index/task_view/breadth-first-search

```
//Function takes in an H x W grid and a starting position (X, Y) and returns if there as a
//path to the bottom right cell or not using BFS
bool bfs_has_path(bool** grid, int H, int W, int X, int Y){
//Queue to hold our cells
queue<pair<int, int>> bfs;
//Add the first node to the queue
bfs.push(make_pair(X, Y));
//Keep going through the queue until it is exhausted all its elements
while(!bfs.empty()) {
//Take cell from the front of the queue
auto p = bfs.front();
bfs.pop();
//If it's blocked or visited, go to the next cell
if (!grid[p.first][p.second])
continue;
grid[p.first][p.second] = false; //Set it to visited
//If it's the destination return that there is a path found
if (p.first == H - 1 && p.second == W - 1)
return true;
//Otherwise check all the neighbouring cells and add them to the queue
if (p.first + 1 < H)
bfs.push(make_pair(p.first + 1, p.second));
if (p.first - 1 >= 0)
bfs.push(make_pair(p.first - 1, p.second));
if (p.second + 1 < W)
bfs.push(make_pair(p.first, p.second + 1));
if (p.second - 1 >= 0)
bfs.push(make_pair(p.first, p.second - 1));
}
//If we have exhausted all the possible cells, we can safely assume there is no path.
return false;
}
```

Hopefully, by now, you have at least some understanding of what breadth-first and depth-first search are, and some use cases. These two algorithms are fundamental in solving many different classes of CS problems, so I definitely suggest getting some practice in using them.

'til next time. -pickle

wiki/bfsvsdfs.txt · Last modified: 2020/03/09 01:33 by abigpickle