Several weeks ago, we met a sort problem in our program of web app. We tried kinds of methods and finally have gotten a nearly 10 times performance improvement. The problem is very interesting and worth recording.
The Problem
Given a list of items, each item with a directory and a create_time, we need to sort those items by two rules:
first rule: a given index_dict, only contains part of the items.
second rule: when the first rule is not exist, use create_time.
“/folder1”, “/folder2” and “/folder3” are not in the index_dict, so they are sorted by create_time: “/folder2” > “/folder3” > “/folder1”
then, Let’s loot at the subfolders of folder2. “/folder2/folder2-folder1” and “/folder2/folder2-folder2” are also not in the index_dict (values), so they are sorted by create_time: “/folder2/folder2-folder2” > “/folder2/folder2-folder1”
then, their subdirectories (here are files), although “/folder2/folder2-folder1/file2” > “/folder2/folder2-folder1/file1” by create_time, in the index_dict, “file1” > “file2”, so the result is ‘/folder2/folder2-folder1/file1’ > ‘/folder2/folder2-folder1/file2’.
Another example, let’s look at folder3, they have sub folders in the index_dict, so sub folders need to be sorted like that: ‘/folder3/folder3-folder1’ > ‘/folder3/folder3-folder2’, then ‘/folder3/folder3-folder4’ > ‘/folder3/folder3-folder3’, by their create_time.
We need to remind you several points again:
the directory locations might be multi-levels
the index keys might be any parent directory location, and they are out of order (dict)
the each index values are not one-to-one with the items (may be more or less)
sub folder or sub files must be created after their parent
First Naive Solution
It was my first time to use Elixir, so I quickly got an easy solution: sort by levels. Here is a small example:
First of all, we sort all the items by their create_time. Then, at each level, we first check if there is an index in the index dict. If is, that’s just the standard for the sub directories. If is not, we do nothing.
So the solution may like the following (we only list the key code).
defcombine_item_index_level(item_level, index_level) do Enum.reduce(item_level, %{}, fn {level, items}, combined_level -> index = Map.get(index_level, level, []) not_in_index = Enum.filter(items, fn x -> Enum.member?(index, x) == false end) combined = index ++ not_in_index Map.put(combined_level, level, combined) end) end
defsort_level_items(combined_level) do combined_level |> Enum.reduce(%{}, fn {level, items}, sorted_items -> standard = Map.get(sorted_items, level-1, []) sorted = Enum.sort_by(items, &Enum.find_index(standard, fn x -> x == get_parent_location(&1) end)) Map.put(sorted_items, level , sorted) end) end
defcombine_sorted_items(sorted_items) do locs_dict = %{"/" => "00000"} sorted_items |> Enum.reduce([], fn {_, items}, level_list -> [items | level_list] end) |> Enum.reverse() |> add_tag_to_location(locs_dict) |> Enum.sort_by(fn {_, tag} -> tag end) |> Enum.map(fn {loc, _} -> loc end) end
defsort(item_list, index_dict) do index_level = build_index_level(index_dict) item_list |> build_item_level() |> combine_item_index_level(index_level) |> sort_level_items() |> combine_sorted_items() end end
That seems a little complex, but easy to understand:
First, we build levels from both items and indexes, the result is a map (dict) with location (directory) level as key and location as value.
Then, we combine each level locations, make sure that index level locations are ahead of item level locations.
After that, we just sort the combined level locations by their parent location from the top level to the last level, just like what we’ve mentioned in the above simple example.
At last, we need to adjust the locations, make sure the children locations are following their parent location instead of grouped by levels.
The problem is obviously a tree problem, so we tried a tree solution with the help of Elixir Forum. Thanks to those enthusiastic guys, we found a way to build the tree, and we sorted the items in Depth-First traversal. The solution main code is here.
Then is the most import function dft_with_sort, we need to sort our items when traversing depth-first.
We need a stack to store tree nodes, at the very first, there is only one node: the root node. With the popped node, we get a current subtree and a list of remained subtrees. The key of the current subtree is then added to the result list. This is our first item. Then the values need to be sorted by our given index. So we build a sorter, which is just the standard with index items ahead of create_time sorted items.
List.first(Tuple.to_list(&1)) gets the key (location) and values then sort by sorter (standard). Finally we change the tuple back to dict and reverse the order.
Why we need to reverse the order here? Because we will pop from the last one next time. [remain | new] is the new stack, we are traversing depth-first.
Another key point is dft_with_sort([], res, _time_sorted_locs, _indexes), this is used to return our final result when stack is empty.
When we used the former solutions to our production, we were not satisfied with the performance. Because this module is used very frequently, so we need a much quicker one. With some more research, we found it — add a parent for each location, and let the tree contain the needed order. Here is our reference: elixir - Create a hierarchical data structure from a flat list with parent_id - Stack Overflow.
Look at this smart thought, we have used less code, but more clearer and faster. The dft function is just like what we’ve seen before, but we do not need to sort. The core function here is the build_tree function. It does two main things: build a tree and sort when building. We must be very familiar with the sort part, it’s almost the same as above, but much easier and clearer. Now let’s focus on the build tree part.
1 2 3 4 5 6 7 8 9 10 11 12
defmoduleTreedo defbuild_tree(list) do list |> Enum.reduce(%{}, fn foo, map -> foo = %{foo | children: Map.get(map, foo.location, [])} Map.update(map, foo.parent, [foo], fn foos -> [foo | foos] end) end) |> Map.get(nil) end end
What does this do? Let’s show you a simple example.
The meaning is to update the value of the foo.parent key from foos to [foo | foos]. For a simple instance:
1 2
iex >Map.update(%{a:1}, :a, 13, fn x -> x * 2end) %{a:2}
Now we know exactly what the function do. It first updates children, then update this update to its parent. So we should use bottom-up approach to get the full tree. We just need to reverse the given list.
You could play with the code here. Believe me, it’s very interesting.
Actually, this method can be used in many fields, especially in the tree data structure. Additionally, use id instead of the real value will always process much faster.
Summary
In this article, we have introduced a typical case of multiway tree. We have shown how to build it based on item values and their parents. We have also shown how to sort the tree by some external conditions. In fact, we believe this case could be used in many fields in your routine work.
By the way, what we have mentioned above is just a simplified sample. Actually we did many optimization in the real world. For example, we sorted the items when querying from database; we cached the result in Redis, and so on.