{"id":62186,"date":"2022-05-16T14:54:08","date_gmt":"2022-05-16T05:54:08","guid":{"rendered":"https:\/\/smilegate.ai\/?p=62186"},"modified":"2022-05-18T15:36:42","modified_gmt":"2022-05-18T06:36:42","slug":"tpu%ec%97%90%ec%84%9c-huggingface-model-%ed%95%99%ec%8a%b5%ed%95%98%ea%b8%b0","status":"publish","type":"post","link":"https:\/\/smilegate.ai\/cn\/2022\/05\/16\/tpu%ec%97%90%ec%84%9c-huggingface-model-%ed%95%99%ec%8a%b5%ed%95%98%ea%b8%b0\/","title":{"rendered":"TPU\uc5d0\uc11c HuggingFace model \ud559\uc2b5\ud558\uae30"},"content":{"rendered":"

[\uac00\uc0c1\uc778\uac04\uc5f0\uad6c\ud300 \ud669\uc900\uc120]<\/p>\n\n\n\n

<\/div>\n\n\n\n

TPU \uc18c\uac1c<\/strong><\/p>\n\n\n\n

TPU(Tensor Processing Unit)\ub294 Google\uc5d0\uc11c \ubc1c\ud45c\ud55c \ud150\uc11c \uc5f0\uc0b0\uc5d0 \ud2b9\ud654\ub41c \ud558\ub4dc\uc6e8\uc5b4\uc785\ub2c8\ub2e4. TPU\ub294 \uc778\uacf5\uc9c0\ub2a5 \ubaa8\ub378\uc744 \ud559\uc2b5\uc2dc\ud0ac \ub54c \ud544\uc694\ud55c \ud589\ub82c \uacf1 \uc5f0\uc0b0\uc744 \uac00\uc18d\ud654\ud558\uc5ec \uae30\uc874 GPU\uc5d0\uc11c \ud559\uc2b5\uc2dc\ud0ac \ub54c\ubcf4\ub2e4 \ub354 \ube60\ub978 \ud559\uc2b5 \uc18d\ub3c4\ub97c \ubcf4\uc778\ub2e4\uace0 \uc54c\ub824\uc838 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n

\ud604\uc7ac TPU\ub294 v4\uae4c\uc9c0 \ucd9c\uc2dc\ub418\uc5c8\uc73c\uba70, \uc2a4\ud399\uc740 \uc544\ub798\uc640 \uac19\uc2b5\ub2c8\ub2e4. \ud604\uc7ac GCP(Google Cloud Platform)\uc5d0\uc11c\ub294 TPU v3\uae4c\uc9c0 \uc0ac\uc6a9\uac00\ub2a5\ud558\uba70, v4\ub294 \ub2f4\ub2f9\uc790 \ubb38\uc758<\/a>\ub97c \ud574\uc57c\ud569\ub2c8\ub2e4.<\/p>\n\n\n\n

\"\"
[\ud45c 1] TPU \ubc84\uc804 \ubcc4 \uc131\ub2a5 \ube44\uad50 – \ucc38\uace0<\/a><\/figcaption><\/figure>\n\n\n\n
<\/div>\n\n\n\n

TPU \ud559\uc2b5 \ud658\uacbd \uc124\uc815<\/strong><\/p>\n\n\n\n

TPU\ub97c \ud1b5\ud55c \ud559\uc2b5\uc740 \ubcf4\ud1b5 GCS(Google Cloud Service)\uc5d0 TFRecord \ud30c\uc77c\uc744 \uc5c5\ub85c\ub4dc\ud55c \ud6c4, TensorFlow+Keras\ub97c \ud1b5\ud574 \ubaa8\ub378\uc744 \ud559\uc2b5\uc2dc\ud0a4\ub294 \uac83\uc774 \ucd5c\uc801\ud654\ub41c \ubc29\ubc95\uc774\uba70, \uac00\uc7a5 \uc88b\uc740 \ud559\uc2b5 \uc18d\ub3c4\ub97c \ubcf4\uc785\ub2c8\ub2e4. PyTorch\ub97c \uc9c0\uc6d0\ud558\uae34 \ud558\uc9c0\ub9cc, TensorFlow\ub97c \uc0ac\uc6a9\ud588\uc744 \ub54c\uc640 \ube44\uad50 \ud588\uc744 \ub54c, \ud559\uc2b5 \uc18d\ub3c4 \uba74\uc5d0\uc11c \ucc28\uc774\uac00 \ub0a9\ub2c8\ub2e4. \ub530\ub77c\uc11c \uc774\ubc88 \ud3ec\uc2a4\ud2b8\uc5d0\uc11c\ub294 TensorFlow\ub97c \uae30\uc900\uc73c\ub85c \ud559\uc2b5 \ud658\uacbd\uc744 \uc124\uc815\ud558\ub824\uace0 \ud569\ub2c8\ub2e4.<\/p>\n\n\n\n

\ud3ec\uc2a4\ud2b8\ub97c \uc774\uc5b4\ub098\uac00\uae30 \uc55e\uc11c, \ud544\uc694\ud55c \ubcc0\uc218\ub098 \ud568\uc218 \uc911 \uadf8 \uc758\ubbf8\uac00 \uba85\ud655\ud55c \uacbd\uc6b0\uc5d4 \ud574\ub2f9 \ucf54\ub4dc\ub294 \uc77c\ubd80 \uc0dd\ub7b5\ub418\uc5c8\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n

<\/div>\n\n\n\n

TPU \uc5f0\uacb0<\/strong><\/p>\n\n\n\n

TensorFlow 2.0 \uc774\uc0c1 \ubc84\uc804\uc5d0\uc11c\ub294 tf.distribute.TPUStrategy() \uac1d\uccb4\ub97c \uc774\uc6a9\ud558\uc5ec TPU\ub97c \uc0ac\uc6a9\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4. \uc774\ub294 tf.distribute.Strategy()<\/a>\uc640 \uac19\uc774 \ubd84\uc0b0 \ud559\uc2b5\uc744 \uc124\uc815\ud558\ub294 \uc815\ucc45\uc774\ub77c\uace0 \uc0dd\uac01\ud558\uba74 \ub429\ub2c8\ub2e4. \ub610\ud55c, TPU\uc640\uc758 connection\ub3c4 \uc774 \uac1d\uccb4\ub97c \ud1b5\ud574 \ub2e4\uc74c\uacfc \ucf54\ub4dc\uc640 \uac19\uc774 \uac00\ub2a5\ud569\ub2c8\ub2e4. \ucd94\uac00\ub85c \uc790\uc2e0\uc758 TPU \uc8fc\uc18c\uac00 \ud544\uc694\ud569\ub2c8\ub2e4.<\/p>\n\n\n\n

tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu=tpu_address)\ntf.config.experimental_connect_to_cluster(tpu)\ntf.tpu.experimental.initialize_tpu_system(tpu)\nstrategy = tf.distribute.TPUStrategy(tpu)<\/code><\/pre>\n\n\n\n
<\/div>\n\n\n\n

\ub370\uc774\ud130\uc14b \uc900\ube44<\/strong><\/p>\n\n\n\n

TFRecord\ub97c GCS\uc5d0 \uc800\uc7a5\ud558\uace0 \uc774\ub97c \ubd88\ub7ec\uc640\uc11c \uc0ac\uc6a9\ud569\ub2c8\ub2e4. GCP\uc758 DataFlow\ub97c \uc774\uc6a9\ud558\uc5ec TFRecord\ub97c \ub9cc\ub4dc\ub294 \ubc29\ubc95\uc5d0 \ub300\ud574\uc11c\ub294 \ucd94\ud6c4 \ud3ec\uc2a4\ud2b8\uc5d0\uc11c \ub2e4\ub8e8\ub3c4\ub85d \ud558\uaca0\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n

def parse_example(serialized_example):\n    data_fields = {\n        \"input_ids\": tf.io.VarLenFeature(tf.int64),\n    }\n    parsed = tf.io.parse_single_example(serialized_example, data_fields)\n    inputs = tf.sparse.to_dense(parsed[\"input_ids\"])\n\n    inputs = tf.cast(inputs, tf.int32)\n\n    return inputs\n\ndef input_fn(tf_records,\n             max_epochs,\n             batch_size,\n             is_training,\n             padding_values,\n             buffer_size=10000):\n\n    if type(tf_records) is str:\n        tf_records = [tf_records]\n    dataset = tf.data.TFRecordDataset(tf_records, buffer_size=buffer_size)\n    \n    if is_training:\n        dataset = dataset.shuffle(buffer_size=buffer_size)\n        dataset = dataset.repeat()\n\n    dataset = dataset.map(parse_example, num_parallel_calls=tf.data.experimental.AUTOTUNE)\n    dataset = dataset.padded_batch(batch_size, padding_values=padding_values)\n    dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)\n    return dataset\n\n\ntrain_dataset_size = len(list(tf.data.TFRecordDataset(train_data_paths)))\nsteps_per_epoch = train_dataset_size \/\/ batch_size if train_dataset_size \/ batch_size == 0 else train_dataset_size \/\/ batch_size + 1\nper_replica_batch_size = max(batch_size \/\/ strategy.num_replicas_in_sync, 1)\n\ntrain_dataset = strategy.experimental_distribute_datasets_from_function(lambda _: \n    input_fn(train_data_paths, batch_size=per_replica_batch_size, max_epochs=max_epochs, is_training=True, padding_values=tokenizer.pad_token_id))\ntrain_iterator = iter(train_dataset)<\/code><\/pre>\n\n\n\n

\ub370\uc774\ud130\uc14b\uc744 \uc900\ube44\ud560\ub54c (\uc804\uccb4 \ud559\uc2b5 \ub370\uc774\ud130\uc758 \uc0d8\ud50c \uac1c\uc218, 1\uc5d0\ud3ed\ub2f9 \ucd1d \uc2a4\ud15d, replica\ubcc4 \ubc30\uce58 \uc0ac\uc774\uc988) \uc774\ub807\uac8c 3\uac1c\uc758 \uc0c1\uc218 \uac12\uc774 \ud544\uc694\ud569\ub2c8\ub2e4. \uadf8\ub9ac\uace0 strategy.experimental_distribute_datasets_from_function() \ud568\uc218\ub97c \uc774\uc6a9\ud574 dataset\uc744 \ubc18\ud658\ud558\ub294 \ud568\uc218\ub97c \uac10\uc2f8\uc8fc\ub294 \ud615\ud0dc\ub85c \ud559\uc2b5 \ub370\uc774\ud130 \ub85c\ub354\ub97c \ub9cc\ub4ed\ub2c8\ub2e4. \uc774 \ud568\uc218\ub294 keras\uc758 model.fit() \ud568\uc218\ub97c \uc774\uc6a9\ud558\uc5ec \ud559\uc2b5\ud560 \uacbd\uc6b0 \uc0ac\uc6a9\ud558\uc9c0 \uc54a\uace0, dataset\uc744 \ubc18\ud658\ud558\ub294 \ud568\uc218\ub9cc \uc120\uc5b8\ud558\uace0 \uc0ac\uc6a9\ud574\uc57c \ud569\ub2c8\ub2e4. parse_example() \ud568\uc218\ub294 TFRecord\uc758 \uac01 example\uc744 data_fields\uc5d0 \ub9de\uac8c \ud30c\uc2f1\ud558\ub294 \uae30\ub2a5\uc744 \ud569\ub2c8\ub2e4. \uc608\uc2dc\ub97c \uac04\ub2e8\ud558\uac8c \ud558\uae30 \uc704\ud574 ‘input_ids’ field\ub9cc \ub123\uc5c8\uc9c0\ub9cc, ‘attention_mask’, ‘token_type_ids’, ‘position_ids’ \ub4f1 \ubaa8\ub378 forward\uc5d0 \ud544\uc694\ud55c \ub2e4\uc591\ud55c field\ub97c \ud30c\uc2f1\ud558\uc5ec return\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4. \uc774\ub807\uac8c \ud55c\ub2e4\uba74, train_dataset\uc740 iter() \ud568\uc218\ub97c \ud1b5\ud574 \ub370\uc774\ud130 \ub85c\ub354\uc758 \uc5ed\ud560\uc744 \ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n

<\/div>\n\n\n\n

\ubaa8\ub378 \uc900\ube44<\/strong><\/p>\n\n\n\n

\ubaa8\ub378\uc740 HuggingFace\uc758 transformers<\/a>\uc5d0\uc11c TensorFlow\uc640 \uc5f0\ub3d9\ub418\ub294 \uc5b4\ub5a4 \ubaa8\ub378\ub3c4 \ubd88\ub7ec\uc640\uc11c \uc0ac\uc6a9\uc774 \uac00\ub2a5\ud569\ub2c8\ub2e4. TPU\uc5d0\uc11c \ud559\uc2b5\uc744 \ud558\uae30 \uc704\ud574\uc11c Strategy.scope() \uc548\uc5d0 \ubaa8\ub378\uacfc optimizer, loss\ub97c \ud3ec\ud568\ud55c \uc5ec\ub7ec metric\uc774 \uc120\uc5b8\ub418\uc5b4\uc57c\ud569\ub2c8\ub2e4.<\/p>\n\n\n\n

from transformers import TFGPT2LMHeadModel, GPT2Config\n\nwith strategy.scope():\n    model_config = GPT2Config()\n    model = TFGPT2LMHeadModel(model_config)\n\n    optimizer = tf.keras.optimizers.AdamW(learning_rate=learning_rate)\n\n    train_loss = tf.keras.metrics.Mean('train_loss', dtype=tf.float32)\n    train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(\n        'train_accuracy', dtype=tf.float32)<\/code><\/pre>\n\n\n\n
<\/div>\n\n\n\n

\ubaa8\ub378 \ud559\uc2b5<\/strong><\/p>\n\n\n\n

\ud559\uc2b5 \ubc29\ubc95\uc740 \ud06c\uac8c 2\uac00\uc9c0 \ubc29\uc2dd\uc774 \uc788\uc2b5\ub2c8\ub2e4. keras\uc758 model.fit() \ud568\uc218\ub97c \ud65c\uc6a9\ud558\ub294 \ubc29\ubc95\uacfc tf.GradientTape()\uc744 \uc774\uc6a9\ud55c \uc0ac\uc6a9\uc790 \uc815\uc758 \ud559\uc2b5 \ub8e8\ud504\ub97c \ud65c\uc6a9\ud558\ub294 \ubc29\ubc95\uc785\ub2c8\ub2e4. HuggingFace\uc758 TF\ubaa8\ub378\ub4e4\ub3c4 model.fit() \ud568\uc218\ub97c \uc0ac\uc6a9\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4. \ud558\uc9c0\ub9cc, \uc774 \ubc29\ubc95\uc740 \ubaa8\ub378\uc758 \uc785\ub825\uc73c\ub85c \ub4e4\uc5b4\uac00\ub294 \ud30c\ub77c\ubbf8\ud130\uac00 \ud55c\uc815\uc801\uc774\uae30 \ub54c\ubb38\uc5d0 BERT\ucc98\ub7fc attention mask \uc815\ubcf4 \ub4f1\uc744 \uac19\uc774 \uc785\ub825\uc73c\ub85c \ub123\uc5b4\uc918\uc57c\ud558\ub294 \ubaa8\ub378\uc744 \ud559\uc2b5\uc2dc\ud0a4\ub294 \ub370\uc5d0 \ud55c\uacc4\uac00 \uc788\uc2b5\ub2c8\ub2e4. \ub530\ub77c\uc11c tf.GradientTape()\uc744 \ud65c\uc6a9\ud558\ub294 \ubc29\ubc95\uc744 \uc124\uba85\ud574\ub4dc\ub9ac\uaca0\uc2b5\ub2c8\ub2e4. \uc774 \ub450\uac1c\uc758 \ucc28\uc774\uc810\uc740 TPU \uc0ac\uc6a9<\/a>\uc5d0 \uc790\uc138\ud788 \uc124\uba85 \ub418\uc5b4\uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n

@tf.function\ndef train_step(iterator):\n    def step_fn(batch):\n        input_ids = batch[0]\n        \n        with tf.GradientTape() as tape:\n            outputs = model(input_ids)\n            logits = outputs.logits\n            loss = tf.keras.losses.sparse_categorical_crossentropy(\n                labels, logits, from_logits=True)\n            loss = tf.nn.compute_average_loss(loss, global_batch_size=batch_size)\n            \n        grads = tape.gradient(loss, model.trainable_variables)\n        optimizer.apply_gradients(list(zip(grads, model.trainable_variables)))\n        training_loss.update_state(loss * strategy.num_replicas_in_sync)\n        training_accuracy.update_state(labels, logits)\n\n    strategy.run(step_fn, args=(next(iterator),))\n\n\nfor epoch in range(max_epochs):\n    print('Epoch: {}\/{}'.format(epoch, max_epochs))\n\n    for step in range(steps_per_epoch):\n        train_step(train_iterator)\n\n    print('Current step: {}, training loss: {}, accuracy: {}%'.format(\n        optimizer.iterations.numpy(),\n        round(float(training_loss.result()), 4),\n        round(float(training_accuracy.result()) * 100, 2)))\n    \n    training_loss.reset_states()\n    training_accuracy.reset_states()<\/code><\/pre>\n\n\n\n
<\/div>\n\n\n\n

\ud6c4\uae30<\/strong><\/p>\n\n\n\n

TPU\ub97c \uc0ac\uc6a9\ud574\ubd24\uc744 \ub54c \ud655\uc2e4\ud788 \ube60\ub978 \uc18d\ub3c4\ub97c \uccb4\uac10\ud560 \uc218 \uc788\uc5c8\uc2b5\ub2c8\ub2e4. \uae30\uc874\uc5d0 \ub9ce\uc740 \uc5f0\uad6c\uc790\ubd84\ub4e4\uc774 \ud65c\uc6a9\ud558\ub294 HuggingFace \ud50c\ub7ab\ud3fc\uc758 \ub2e4\uc591\ud55c \ubaa8\ub378\ub4e4\uc744 \uac00\uc838\uc640 \uacf5\uc2dd\uc801\uc73c\ub85c \uad8c\uc7a5\ud558\ub294 \ubc29\ubc95\ub4e4\ub85c \ud559\uc2b5\ud560 \uc218 \uc788\ub2e4\ub294 \uc0ac\uc2e4\uc740 \ub9e4\ub825\uc801\uc774\ub77c\uace0 \uc0dd\uac01\ud569\ub2c8\ub2e4. TPU v4\ub294 v3\ubcf4\ub2e4 2\ubc30 \uac00\uae4c\uc774 \uc131\ub2a5\uc774 \ub192\uc544\uc9c0\ub294 \ub9cc\ud07c, GCP\uc5d0 \ubcf4\uae09\ud654\ub41c \uc774\ud6c4 \ud559\uc2b5 \uc18d\ub3c4 \ubc0f \ucd94\ub860 \uc18d\ub3c4\uac00 \uae30\ub300\ub429\ub2c8\ub2e4.<\/p>\n

<\/span><\/div>","protected":false},"excerpt":{"rendered":"

[\uac00\uc0c1\uc778\uac04\uc5f0\uad6c\ud300 \ud669\uc900\uc120] TPU \uc18c\uac1c TPU(Tensor Processing Unit)\ub294 Google\uc5d0\uc11c \ubc1c\ud45c\ud55c \ud150\uc11c \uc5f0\uc0b0\uc5d0 \ud2b9\ud654\ub41c \ud558\ub4dc\uc6e8\uc5b4\uc785\ub2c8\ub2e4. TPU\ub294 \uc778\uacf5\uc9c0\ub2a5 \ubaa8\ub378\uc744 \ud559\uc2b5\uc2dc\ud0ac \ub54c \ud544\uc694\ud55c \ud589\ub82c \uacf1 \uc5f0\uc0b0\uc744 \uac00\uc18d\ud654\ud558\uc5ec \uae30\uc874 GPU\uc5d0\uc11c \ud559\uc2b5\uc2dc\ud0ac \ub54c\ubcf4\ub2e4 \ub354 \ube60\ub978 \ud559\uc2b5 \uc18d\ub3c4\ub97c \ubcf4\uc778\ub2e4\uace0 \uc54c\ub824\uc838 \uc788\uc2b5\ub2c8\ub2e4. \ud604\uc7ac TPU\ub294 v4\uae4c\uc9c0 \ucd9c\uc2dc\ub418\uc5c8\uc73c\uba70, \uc2a4\ud399\uc740 \uc544\ub798\uc640 \uac19\uc2b5\ub2c8\ub2e4. \ud604\uc7ac GCP(Google Cloud Platform)\uc5d0\uc11c\ub294 TPU v3\uae4c\uc9c0 \uc0ac\uc6a9\uac00\ub2a5\ud558\uba70, v4\ub294 \ub2f4\ub2f9\uc790 \ubb38\uc758\ub97c \ud574\uc57c\ud569\ub2c8\ub2e4. TPU \ud559\uc2b5 \ud658\uacbd…<\/p>\n

<\/span><\/div>","protected":false},"author":1,"featured_media":62195,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"rank_math_lock_modified_date":false,"footnotes":""},"categories":[532,19],"tags":[188],"class_list":["post-62186","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-nlp","category-tech04","tag-featured","category-532","category-19","description-off"],"_links":{"self":[{"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/posts\/62186","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/comments?post=62186"}],"version-history":[{"count":10,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/posts\/62186\/revisions"}],"predecessor-version":[{"id":62198,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/posts\/62186\/revisions\/62198"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/media\/62195"}],"wp:attachment":[{"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/media?parent=62186"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/categories?post=62186"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/smilegate.ai\/cn\/wp-json\/wp\/v2\/tags?post=62186"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}