Function Hashing - A Simple Idea with Powerful Implications (Unison)

Last year I came across an ingenious concept implemented in the Unison programming language. It’s such a simple yet underutilized idea, and I can’t help but imagine how nice it could be if adopted by mainstream languages like Python, Rust, or C++. Of course, making it work in those ecosystems would be a monumental challenge, especially with ABI compatibility and other complexities. Sadly, Unison remains a mostly unknown and experimental language, not intended for production. I hope other languages will discover this idea and implement it. I even explored the Python codebase to see if I could add it, but it would require far more time than I currently have.

The Idea

The core idea is straightforward: for every function you write, Unison compute a hash based on its body and parameters. As long as neither changes, the hash remains the same.

This seemingly minor tweak has profound implications:

  1. Faster Builds Instead of rebuilding the entire abstract syntax tree (AST) during compilation, you only need to rebuild the parts where hashes have changed. This can dramatically speed up build times, especially for large projects.
  2. Version Compatibility Functions with different bodies or parameters naturally produce different hashes. This makes managing versions trivial - functions from version 1 and version 2 can coexist seamlessly since their hashes distinguish them.
  3. Distributed Computing Imagine a distributed system where only the necessary functions are fetched on demand. When a remote machine needs to execute code, it can pull the required functions by their hashes, either from a main server or peer-to-peer from other machines. This approach minimizes unnecessary data transfer and enables efficient code sharing, reducing the need for complex distributed programming strategies during development.

The idea has even more intriguing implications for how software can be written, shared, and executed. For a deeper dive, I highly recommend checking out Unison’s explanation.

I extrapolated some other uses:

  1. Faster Builds and Incremental Deployment Beyond just faster builds, this idea could enable an incremental deployment system, including hot-swapping code at runtime. For example, in Python, imagine a scenario where your code fails after 3 hours of processing due to a critical error in a method. Instead of restarting the program, a mechanism could save the current state of execution, allow minor changes to be reverted (not to be confounded with reversible computing which is on its way, maybe), and allow the developer to upload a new version of the function to continue execution without restarting.
  2. Immutable Function Repositories A function hash serves as a unique identifier, enabling the creation of immutable repositories of functions, akin to how Git tracks code. Instead of traditional package managers (e.g., pip or npm), repositories could fetch functions or modules directly by their hashes, ensuring precise versioning and eliminating dependency conflicts. This would shift software development from a project-based approach to a function-based one. While more suitable for pure functions, this idea could be extended to entire immutable objects, where full pure objects could also be hashed.
  3. Secure software Hashes ensure that function bodies remain unaltered by third parties. Shared repositories of hashed functions would allow everyone to verify the integrity of code. Organizations could maintain whitelists of vetted functions to deploy only approved code, leaving unchecked functions behind, thus enhancing software security.
  4. GPU Compute version GPU code is stored on the GPU and linked by the main program. Using hashes, we could dynamically relink different versions of GPU code at runtime, allowing hybrid functionalities between versions. For example, if a GPU doesn’t fully support the GPU version 2 firmware, the program could fallback to a hybrid setup - using foo from version 1 and bar from version 2 - adapting execution to hardware constraints.
  5. Dynamic Programming and Memoization Instead of manually crafting a data structure for memoization, hashes could enable automatic memoization. The system could associate already executed parameters with the hash of the function, streamlining dynamic programming tasks.
  6. Other ideas: Better Debugging, Function hashes could be logged during execution, enabling precise identification of function versions across builds. No more wondering whether a function changed between runs. Reduced Code Duplication, With smart semantic algorithms that interpret functions as syntactic trees, even if the same logic is rewritten in different styles, the system could generate the same hash, reducing redundant code.

I can only begin to imagine the countless implications this simple idea might have.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • The Evolution of Scientific Curiosity - From Triangles to Gaussians in Graphics
  • Inquiry-Based Learning - Navigating the Path to Deep Understanding in Class
  • Data-Oriented Programming in Rust
  • From Print to Pixel - Transforming Acronym Usage in Academic Writing
  • Docker Dependency Magic - Unleashing sed to Tailor Your Packages