hash - python - 如何从hashset中添加和检索字符串的时间复杂度

假设我们一组长字符串添加到hashset,然后测试一些字符串是否已经存在,

例如,如果我们有三个字符串。


s1 = 'abcdefghijklmn'


s2 = 'dalkfdboijaskjd'


s3 = 'abcdefghijklmn'



然后我们做:


pool = set()


pool.add(s1)


pool.add(s2)


print s3 in pool # => True


print 'zzzzzzzzzz' in pool # => False



时间:

维基是你的朋友

https://wiki.python.org/moin/TimeComplexity

上面的操作是一个O(1)的set

平均情况是O(1),但是最糟糕的情况是O(n),其中n是集合中元素的数量,这种情况是由哈希冲突引起的,你可以在这里了解更多https://www.geeksforgeeks.org/internal-working-of-set-in-python/

一个快速而不科学的基准:


import time



def make_string(c, n):


 return c * n



def make_tuple(el, n):


 return (el,) * n



def hashtest(gen, n):


 # First compute how long generation alone takes


 gen_time = time.perf_counter()


 for x in range(n):


 gen()


 gen_time = time.perf_counter() - gen_time



 # Then compute how long hashing and generation takes


 hash_and_gen_time = time.perf_counter()


 for x in range(n):


 hash(gen())


 hash_and_gen_time = time.perf_counter() - hash_and_gen_time



 # Return the two


 return (hash_and_gen_time, gen_time)



for gen in (make_string, make_tuple):


 for obj_length in (10000, 20000, 40000):


 t = f"{gen.__name__} x {obj_length}"


 # Using `b'hello'.decode()` here to avoid any cached hash shenanigans


 hash_and_gen_time, gen_time = hashtest(


 lambda: gen(b"hello".decode(), obj_length), 10000


 )


 hash_time = hash_and_gen_time - gen_time


 print(t, hash_time, obj_length / hash_time)



输出


make_string x 10000 0.23490356100000004 42570.66158311665


make_string x 20000 0.47143921999999994 42423.284172241765


make_string x 40000 0.942087403 42458.905482254915


make_tuple x 10000 0.45578034300000025 21940.393335480014


make_tuple x 20000 0.9328520900000008 21439.62608263008


make_tuple x 40000 1.8562772150000004 21548.505620158674



基本上说哈希序列是字符串或元组,是线性时间,但是哈希字符串比散列组快得多。


import time


s = ('x' * 500_000_000)


t0 = time.perf_counter()


a = hash(s)


t1 = time.perf_counter()


print(t1 - t0)


t0 = time.perf_counter()


b = hash(s)


t2 = time.perf_counter()


assert a == b


print(t2 - t0)



输出


0.26157095399999997


1.201999999977943e-06



...