wiki:bfsvsdfs

# Differences

This shows you the differences between two versions of the page.

 — wiki:bfsvsdfs [2020/03/09 01:33] (current)abigpickle created 2020/03/09 01:33 abigpickle created 2020/03/09 01:33 abigpickle created Line 1: Line 1: + +

Preface

+

Hey there. This post will cover two of the most important algorithms in computer science, breadth-first search and depth-first search. It will not, however, teach you the basics of programming. If you do not already know how to program, there are various links in this wiki that will take you to those kinds of resources. This assumes you already know how to program, and the language used will be C++. With that out of the way let's begin.

+

What the hell are these?

+

BFS and DFS are, as suggested by their names, searching algorithms. They allow you to find the shortest path or at least a path in a grid or graph-like structure. Although their approach to doing this is different, they fundamentally achieve the same goal and in most cases, at the same time complexity. Pathfinding in this context can also have different meanings. Algorithms like DFS can be used to solve puzzles, mazes, and other problems that require a brute force search through different possible paths of attack. As a plus side, they are also very easy to understand and implement as they take advantage of two powerful data structures; stacks and queues.

+

DFS

+

Let's start with DFS. Suppose we are on a grid.  We start at the top left corner, and we want to reach the bottom left corner in a path that only goes down and right for simplicity. In this grid, there are obstacles placed in cells that cannot be passed through. As a final caveat, our eyesight is really bad and we can only see cells directly around us. With these constraints, how would we find a path?

+

+

Well, we can think of a very simple strategy. We continue moving forward in a single direction until we either got to the exit, hit a wall, or an obstacle. In the latter two, all we need to do is check the next possible direction (we only have 2, down and right). If that fails, we know that path is a dud. We can just backtrack our steps to the cell we last were, and repeat. This process of repeatedly traversing down a single "branch" of the structure and backtracking if it fails is the DFS algorithm.

+

You might be able to see why it's called depth-first search now. We are travelling down the virtual depth of this data structure, which happens to be a 2D grid. But this approach can work for many different problems. In a sudoku puzzle, we can try out a number, then the next, and the next, until we find out this number we tried does not agree with the rules of the game; in which case we can simply backtrack back to the last position in which it did work.

+

Implementation (and limitations)

+

The DFS algorithm works with the stack data structure. If you don't know what a stack is, it's basically a growing and shrinking block of memory where items and pushed on top of each other and later popped off, like a physical stack of elements. Stacks are crucial to the algorithm as they are what allow the backtracking to work. At its simplest, this can be implemented using a recursive function call. This way, we use the program's stack (which is blazingly fast) as the stack for the process. Unfortunately, this can also be a major crutch. If the data set is very large, repeated recursive calls will quickly overflow the stack and cause your program to crash. Another major limitation with this algorithm is that unless every single possible path is tested, it does not guarantee the shortest path to the destination. Below is a sample recursive function for an H x W grid that could solve this problem.

+
//This function takes in a grid of size H x W,
+                                        //which has cells marked false if they have obstacles and the
+                                        //current position of the checking cell, and returns whether there is a path or not.
+                                        bool dfs_find_path(bool** grid, int H, int W, int X, int Y) {
+                                        if (X == H - 1 && Y == W - 1)
+                                        //If the cell is the goal cell, we return true and say there is a path.
+                                        return true;
+
+                                        if (X >= H || Y >= W)
+                                        //If the cell is out of the grid, we return false and say there is no path from here.
+                                        return false;
+
+                                        if (!grid[X][Y])
+                                        //If this cell is an obstacle, just like the first we return false with no path.
+                                        return false;
+
+                                        //If all checks are clear,
+                                        //we can then return whether either the right or down has a path.
+                                        //This way, if we start from cell (1, 1), the top left,
+                                        //we can ensure we know whether there is at least one path or not.
+                                        return dfs_find_path(grid, H, W, X + 1, Y) || dfs_find_path(grid, H, W, X, Y + 1);
+
+                                        }
+

BFS

+

New problem. Say we are again on a grid, starting at any arbitrary point (x, y). Our goal is to travel to a point (m, n), which resides somewhere on this grid. We do know where this point is until we have reached it. There are also again obstacles blocking our path, placed in this grid at some cells. To make matters worse, there is no longer a restriction on which direction you can move. Left, right, up, down. All valid. How can we find the shortest path to the destination cell?

+

Seems like quite the upgrade. This problem is indeed solvable using DFS if we make it try every single possible route, returning the shortest one it has found, but that's not very efficient. Not at all. Instead, we need a new strategy. Enter, breadth-first search. Breadth is essentially another word for width, and that can clue us in on what this does. Instead of travelling down the entire depth of a path when we search, we search in a sweeping like motion from our starting node. Also unlike DFS, if we stop once we found the destination BFS guarantees us the shortest path. This is also sometimes called a flood fill.

+

Cool. How does it work?

+

Glad you asked. While DFS uses a stack, BFS takes advantage of the queue. Instead of picking up the top of a stack of elements, you take from the elements that were placed first. Making it like a queue. At every cell we check, we first do our preliminary checks to see if it's out of the grid or the destination and all that, then if none of those passes, we throw the neighbouring nodes (or cells) onto the queue for later checking. It is also crucial we mark cells as "already checked" so that we don't recheck cells and keep going in loops. If you think about it conceptually, since we are throwing everything on a queue, we go out in a sweeping like motion. First, we check four paths. Then from those four paths, we check all the paths next to them. And so on. Because of this sweeping motion, at every moment in time, the current "level" of the queue we are checking is the same distance away from the starting node (in an unweighted graph). This means once the destination is found, we can safely assume that there is no faster way of reaching this point.

+

Sounds much better than the DFS. What's the catch?

+ Both algorithms have their own use case. The main problem BFS suffers from is if the end goal is very far from the start. Since BFS goes out in a sweeping motion, it could effectively check the entire grid before it reaches the destination, which DFS can reach it almost instantly in one path. BFS can also not be applied to as many problems as DFS, as it's more suited to physical pathfinding than puzzles. Here's a gif showing it in action:

+

+

Sample BFS code

+ BFS is also very simple to implement using an STL queue and does not make any recursive calls. For simplicity, this only returns whether there is a path or not, and has the destination hardcoded, but can easily be adapted to return for example the length of the shortest path. Also fair warning I have not tested this to make sure it even compiles, but you should get the idea of what it's doing. Code below:

+
//Function takes in an H x W grid and a starting position (X, Y) and returns if there as a
+                                        //path to the bottom right cell or not using BFS
+                                        bool bfs_has_path(bool** grid, int H, int W, int X, int Y){
+                                        //Queue to hold our cells
+                                        queue<pair<int, int>> bfs;
+
+                                        //Add the first node to the queue
+                                        bfs.push(make_pair(X, Y));
+
+                                        //Keep going through the queue until it is exhausted all its elements
+                                        while(!bfs.empty()) {
+                                        //Take cell from the front of the queue
+                                        auto p = bfs.front();
+                                        bfs.pop();
+
+                                        //If it's blocked or visited, go to the next cell
+                                        if (!grid[p.first][p.second])
+                                        continue;
+                                        grid[p.first][p.second] = false; //Set it to visited
+
+                                        //If it's the destination return that there is a path found
+                                        if (p.first == H - 1 && p.second == W - 1)
+                                        return true;
+
+                                        //Otherwise check all the neighbouring cells and add them to the queue
+                                        if (p.first + 1 < H)
+                                        bfs.push(make_pair(p.first + 1, p.second));
+                                        if (p.first - 1 >= 0)
+                                        bfs.push(make_pair(p.first - 1, p.second));
+                                        if (p.second + 1 < W)
+                                        bfs.push(make_pair(p.first, p.second + 1));
+                                        if (p.second - 1 >= 0)
+                                        bfs.push(make_pair(p.first, p.second - 1));
+                                        }
+                                        //If we have exhausted all the possible cells, we can safely assume there is no path.
+                                        return false;
+
+                                        }
+

Conclusion

+

Hopefully, by now, you have at least some understanding of what breadth-first and depth-first search are, and some use cases. These two algorithms are fundamental in solving many different classes of CS problems, so I definitely suggest getting some practice in using them.

+

'til next time. -pickle

+ 