Wednesday, June 18, 2008

Checksums in 3ds Max (part 1 of 2)

In my Calling Python from MaxScript post I mentioned the usefulness of checksums in Tech Art work. I was hoping to elaborate on that a bit.

In short, a checksum is a number computed from a larger piece of data. The checksum is (ideally) guaranteed to be unique for that data. Let's say that data is this string: "Tech Art is A-#1 Supar", and the checksum you've computed is "30532". If any character in that string changes, the computed checksum for it will be different, like "18835" or "1335"... basically anything other than "30532".

Checksums are most useful in cases where you need to compare two sets of data to see if they differ, but don't care where or how they differ. If you have a short number that uniquely identifies a huge piece of data, you can compare it to other data sets much faster/easier than comparing every element of the original data. If you're hip to how slow n-squared searches can be (especially in languages like MEL or MaxScript), this is a classic method for avoiding them.

Here's a MaxScript function that takes a string of any length and returns a checksum for it.

------------------------------------------------------------
-- (str)getChecksum (string)val (int)size:256
--
-- Description:
-- Calculates simple checksum value from supplied string (or
-- any value convertable to a string).  Default size is 256,
-- but can be changed with the optional "size" parameter.
------------------------------------------------------------
fn getChecksum val size:256 = (
   if (classof val != String) then (
      try (
         val = val as String
      ) catch (
         return false
      )
   )
   alphaKey = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 !@#$%^&*()[]\\{};':\",./<>?"
   total = 0
   for i in 1 to val.count do (
      thisVal = findString alphaKey val[i]
      if (thisVal == undefined) then (
         thisVal = 0
      )
      -- Multiply the alphanumeric value by its position in
      -- the input string, add to running total
      total += (thisVal * i)
   )
      -- make sure divisor is smaller than dividend
   while (total < size) do (
      total = total * 2
   )

   -- Return final checksum value
   checksum = mod total size
   return (checksum as string)
)
We used the above function in several of the Saints Row tools, primarily to help remove identical materials in 3ds Max scenes. It's very unscientific, however, and can generate collisions in rare cases (two different input strings that generate the same output checksum) **.

If you don't mind a little more setup, I would recommend an alternate checksum method. Python natively offers more robust checksum tools, and can be set up to be called directly from MaxScript. The steps for doing this, and the actual MD5 checksum function, are covered in my earlier blog, Calling Python from MaxScript.

Start with the Python script from that blog that defines and registers the COM server. Then you're able to use a far-shorter MaxScript function to get checksums:

fn getChecksum val = (
   comObj = createOLEObject "PythonCom.Utilities" 
   checksum = comObj.checksumMD5 val
   return checksum
)
That's it. The checksums you get from this function will create fewer collisions than the pure-MaxScript one above, and can be made to use any alternative method available in Python.

Now you know more about checksums, and how to generate them in 3ds Max. Next time (in Part 2) I'll discuss how you can use them to save memory in both Max and your game engine.

11 comments:

Rob Galanakis said...
This comment has been removed by the author.
Anonymous said...

Correct me if I'm wrong, but are he caps in the alphakey not a bit useless? As far as I know, findString is case insensitive.

For example, both "aaa" and "AAA" produce a checksum of 128.0.

Adam Pletcher said...

Right you are, good catch. Probably not helping the collisions issue, is it? :) Another reason not to use that implementation.

Unknown said...

Is MD5 actually doable internally in Maxscript? If not why is that? I was going to start looking into it myself.

Anonymous said...

You can use "matchpattern" instead of findstring to get round the case sensitive issue.

Adam Pletcher said...

@matt clark,
MaxScript has no native checksum methods, MD5 or otherwise. It's possible you could do something via .NET in the newer releases, but I haven't looked.

@anonymous,
You could make something work with matchPattern, yeah. Although that only returns true/false and what we're really after is the char's position in the key string. You'd need to work in subString or something similar on top of it.

loocas said...

You can easily utilize .NET classes for such a task.

Here's a sample code for hashing your strings using the MD5 algorithm returning a byte array:

md5Class = dotNetClass "System.Security.Cryptography.MD5"
encoding = dotNetClass "System.Text.Encoding"

myString = "DUBER.CZ"

MD5 = md5Class.Create()
MD5.ComputeHash(encoding.ASCII.GetBytes(myString))

I'll look deeper into the whole .NET hashing methods as this isn't the same hex value you can get from Python. But it should give you an idea about accessing .NET classes and using their properties :)

Hope it helps, cheers,

- loocas

loocas said...

Oks, when I posted the reply (very late at night over here :D ) I realized that the returned array was actually the same as it would be from Python's MD5 has function, the only difference is that the array is in a decimal form.

A simple function for re-formatting the array of decimals into a whole string representing a hexadecimal conversion will do. Just store the array in a variable called "hashArr" and convert it like so:

for o in hashArr do
(
tmpTxt += formattedPrint o format:"x"
)

This is the result of a MD5 has for "DUBER.CZ" from within MAXScript:

"c1e88352c1c25c5aa01d6d504e8d8d39"

and this is Python's MD5 has output:

"c1e88352c1c25c5aa01d6d504e8d8d39"

Trevor said...

You could also do it using Pythons built-in hashlib. This will compute an MD5 digest which will match with other MD5 tools out there.

import hashlib
def strChecksum(s):
"""Return checksum for the given string"""
m = hashlib.md5()
m.update(s)
return m.hexdigest()

Adam Pletcher said...

@Trevor,
That's exactly what I did. Look above, from with the paragraph starting with "If you don't mind a little more setup..."

Unknown said...

Thank you so much for this! I don't have time to try and understand it atm but it took my scene from 1200 materials down to 12 - possibly saving years of my life.

Much appreciated!!