检查点序列化过程-2

save(k)

  • save_str
  • obj:’model_state_dict’
  • n:16
  • data:b’X\x10\x00\x00\x00model_state_dict’
  • self.memoize(obj)
  • idx:1 obj:’model_state_dict’
  • id(obj):131443899919424

save(v)

  • t:collections.OrderedDict
  • f:None
  • 检查一个可能存在的“私有分发表”(dispatch table),找不到
  • 是否是 Python 的类类型(type)
  • reduce_ex 方法存在,rv = reduce(self.proto)
  • rv = reduce(self.proto):调用 reduce_ex 方法,并将返回值存储在 rv 中。
  • rv是一个元祖tuple
  • l = len(rv)
  • 5
  • 最终调用save_reduce

save_reduce

  • func:<class ‘collections.OrderedDict’>
  • args:() rv,tuple
  • save(func)

save(func)

  • save(type)
1
2
3
4
5
def save_type(self, obj):
return self.save_global(obj)

dispatch[FunctionType] = save_global
dispatch[type] = save_type
  • self.save_global(obj)

  • obj/obj2:<class ‘collections.OrderedDict’>

  • name/lastname:’OrderedDict’

  • module_name:’collections’

  • parent/module:import(module_name, level=0):<module ‘collections’ from ‘/home/dell/anaconda3/envs/torch_new_env/lib/python3.11/collections/__init__.py’>

  • write(data):b’ccollections\nOrderedDict\n’

  • memo(idx(2)):b’q\x02’

save(args)

  • save_tuple(self, obj)
  • obj:()
    1
    2
    3
    4
    5
    6
    7
    def save_tuple(self, obj):
    if not obj: # tuple is empty
    if self.bin:
    self.write(EMPTY_TUPLE)
    else:
    self.write(MARK + TUPLE)
    return
  • tuple是空,write(EMPTY_TUPLE):b’)’
  • write(REDUCE):b’R’

memo(obj)

  • memo(idx(3)):b’q\x03’
  • id(obj):132101266175680
  • _batch_setitems

n=7 _batch_setitems

save(k)

  • write(MARK):b’(‘
  • save_str(embedding.weight)
  • write(b’X\x10\x00\x00\x00embedding.weight’)
  • memo(b’q\x04’)

save(v)

  • rv = reduce(self.proto),看起来像TypedStorage
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
     -2.759326696395874
    -0.4928496479988098
    -0.9460598826408386
    0.12430991232395172
    -0.8162614107131958
    -0.46302852034568787
    2.887676239013672
    0.2881816625595093
    -0.3105534017086029
    -0.1721319556236267
    1.5114414691925049
    0.6535921096801758
    [torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 10000], 0, (10000, 1), (1, 1), False, OrderedDict()))
  • save_reduce()
  • save(func):<function _rebuild_tensor_v2 at 0x7825cb0ff060>
  • save(args) args看起来就是rv
  • obj
    1
    2
    3
    4
    5
    6
    7
    tensor([[ 1.1894],
    [-0.5411],
    [-2.4535],
    ...,
    [-0.1721],
    [ 1.5114],
    [ 0.6536]])

save(func)

  • save_global
  • name:’_rebuild_tensor_v2’
  • module_name:’torch._utils’
  • write(b’ctorch._utils\n_rebuild_tensor_v2\n_rebuild_tensor_v2\n’)
  • memo(b’q\x05’)

save(args)

  • save_tuple
  • write(MARK):b’(‘
  • save(element)

persistent_id

1
2
3
4
5
6
7
8
9
if isinstance(obj, torch.storage.TypedStorage):
# TODO: Once we decide to break serialization FC, this case
# can be deleted
storage = obj._untyped_storage
print(obj._untyped_storage)
storage_dtype = obj.dtype
storage_type_str = obj._pickle_storage_type()
storage_type = getattr(torch, storage_type_str)
storage_numel = obj._size()
  • storage_dtype:torch.float32
  • storage_type_str:’FloatStorage’
  • storage_type:<class ‘torch.FloatStorage’>
  • storage_numel:10000
  • storage_key=0
    1
    2
    3
    4
    5
    def save_pers(self, pid):
    # Save a persistent id reference
    if self.bin:
    self.save(pid, save_persistent_id=False)
    self.write(BINPERSID)

检查点序列化过程-2
http://sjx.com/2024/12/11/检查点序列化过程-2/
作者
sjx
发布于
2024年12月11日
许可协议