{"id":6699,"date":"2025-06-04T10:42:05","date_gmt":"2025-06-04T01:42:05","guid":{"rendered":"https:\/\/blog.since2020.jp\/?p=6699"},"modified":"2025-06-04T10:42:05","modified_gmt":"2025-06-04T01:42:05","slug":"use_google_coraboratory_deal_google_document","status":"publish","type":"post","link":"https:\/\/since2020.jp\/media\/use_google_coraboratory_deal_google_document\/","title":{"rendered":"Google Colaboratory\u3092\u4f7f\u3063\u3066Google\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u3092\u51e6\u7406\u3059\u308b\u65b9\u6cd5"},"content":{"rendered":"\n<p>\u3053\u306e\u8a18\u4e8b\u3067\u306f\u3001Google Colaboratory\u3092\u4f7f\u3063\u3066Google\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306e\u5185\u5bb9\u3092\u81ea\u52d5\u53d6\u5f97\u3057\u3001\u305d\u306e\u30c6\u30ad\u30b9\u30c8\u3092.txt\u30d5\u30a1\u30a4\u30eb\u306b\u5909\u63db\u30fb\u4fdd\u5b58\u3059\u308b\u65b9\u6cd5\u3092\u3054\u7d39\u4ecb\u3057\u307e\u3059\u3002API\u9023\u643a\u306e\u57fa\u672c\u3082\u5b66\u3079\u308b\u305f\u3081\u3001\u521d\u3081\u3066Google API\u3092\u6271\u3046\u65b9\u306b\u3082\u304a\u3059\u3059\u3081\u3067\u3059\u3002<\/p>\n\n\n<h2>\u306f\u3058\u3081\u306b<\/h2>\n<p>Google\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306f\u3001\u30aa\u30f3\u30e9\u30a4\u30f3\u3067\u6587\u66f8\u3092\u4f5c\u6210\u30fb\u5171\u6709\u3067\u304d\u308b\u4fbf\u5229\u306a\u30b5\u30fc\u30d3\u30b9\u3067\u3059\u3002\u3057\u304b\u3057\u3001\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u3084\u30c7\u30fc\u30bf\u5206\u6790\u3092\u884c\u3046\u969b\u306b\u306f\u3001Python\u306a\u3069\u306e\u30d7\u30ed\u30b0\u30e9\u30e0\u3067\u305d\u306e\u4e2d\u8eab\u3092\u53d6\u5f97\u3057\u3066\u51e6\u7406\u3057\u305f\u3044\u5834\u9762\u3082\u591a\u304f\u3042\u308a\u307e\u3059\u3002<\/p>\r\n<p>\u3053\u306e\u8a18\u4e8b\u3067\u306f\u3001<strong>Google Colaboratory\u3092\u4f7f\u3063\u3066Google\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306e\u5185\u5bb9\u3092\u81ea\u52d5\u53d6\u5f97\u3057\u3001\u305d\u306e\u30c6\u30ad\u30b9\u30c8\u3092.txt\u30d5\u30a1\u30a4\u30eb\u306b\u5909\u63db\u30fb\u4fdd\u5b58\u3059\u308b\u65b9\u6cd5<\/strong>\u3092\u3054\u7d39\u4ecb\u3057\u307e\u3059\u3002API\u9023\u643a\u306e\u57fa\u672c\u3082\u5b66\u3079\u308b\u305f\u3081\u3001\u521d\u3081\u3066Google API\u3092\u6271\u3046\u65b9\u306b\u3082\u304a\u3059\u3059\u3081\u3067\u3059\u3002<\/p>\r\n<p><!-- notionvc: 5ed321d8-4b4c-4686-aeeb-c6ad08d7b7a9 --><\/p>\n\n<h2>\u5b9f\u884c\u74b0\u5883\u3068\u5fc5\u8981\u306a\u3082\u306e<\/h2>\n<p><strong>\u5b9f\u884c\u74b0\u5883<\/strong><\/p>\r\n<p><strong>Google Colaboratory<\/strong><\/p>\r\n<p>&nbsp;<\/p>\r\n<p><strong>\u4f7f\u7528\u3059\u308bAPI<\/strong><\/p>\r\n<ul>\r\n\t<li>Google Drive API<\/li>\r\n\t<li>Google Docs API<\/li>\r\n<\/ul>\r\n<p><!-- notionvc: 15f33705-4db9-4aeb-9463-60a5013da787 --><\/p>\n\n<h2>\u5b9f\u884c\u624b\u9806<\/h2>\n<ol>\r\n\t<li>Google Drive API\u3092\u4f7f\u7528\u3057\u3066\u30d5\u30a9\u30eb\u30c0\u30fcID\u3001\u30d5\u30a1\u30a4\u30ebID\u306e\u53d6\u5f97<\/li>\r\n\t<li>Google Docs\u3092\u4f7f\u7528\u3057\u3066Google Document\u30d5\u30a1\u30a4\u30eb\u306e\u30c6\u30ad\u30b9\u30c8\u3092\u53d6\u5f97<\/li>\r\n\t<li>\u53d6\u5f97\u3057\u305f\u30c6\u30ad\u30b9\u30c8\u3092\u30c6\u30ad\u30b9\u30c8\u30d5\u30a1\u30a4\u30eb\u306b\u4fdd\u5b58<\/li>\r\n<\/ol>\r\n<p><!-- notionvc: bf89ea46-66d4-4f45-a3b7-dd0f17faa838 --><\/p>\n\n<h2>\u5b9f\u884c\u30b3\u30fc\u30c9<\/h2>\n<p>\u5fc5\u8981\u306a\u30e9\u30a4\u30d6\u30e9\u30ea\u30fc\u306e\u30a4\u30f3\u30dd\u30fc\u30c8<\/p>\r\n<div class=\"hcb_wrap\">\r\n<pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import pandas as pd\r\nimport os\r\nimport glob\r\nfrom google.colab import auth\r\nfrom googleapiclient.discovery import build\r\nimport requests\r\nimport json\r\nimport unicodedata<\/code><\/pre>\r\n<\/div>\r\n<p>Google Colab\u306e\u8a8d\u8a3c<\/p>\r\n<div class=\"hcb_wrap\">\r\n<pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from google.colab import drive\r\ndrive.mount(\"\/content\/drive\")<\/code><\/pre>\r\n<\/div>\r\n<p>Google Cloud\u306e\u8a8d\u8a3c<\/p>\r\n<div class=\"hcb_wrap\">\r\n<pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>auth.authenticate_user()\r\n\r\ndrive_service = build('drive', 'v3')\r\ndocs_service = build('docs', 'v1')<\/code><\/pre>\r\n<\/div>\r\n<p>\u30d5\u30a9\u30eb\u30c0ID\u306e\u53d6\u5f97<\/p>\r\n<div class=\"hcb_wrap\">\r\n<pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>path = \"\/content\/drive\/MyDrive\/test\"\r\nfiles = glob.glob(os.path.join(path,\"*.gdoc\"))\r\n\r\nfolder_name = path.split(\"\/\")[-1]\r\nquery = f\"mimeType = 'application\/vnd.google-apps.folder' and name = '{folder_name}'\"\r\n\r\nresults = drive_service.files().list(q=query, fields=\"files(id, name)\").execute()\r\nfolders = results.get('files', [])\r\nfolder_id = folders[0][\"id\"]<\/code><\/pre>\r\n<\/div>\r\n<p>\u30d5\u30a1\u30a4\u30ebID\u306e\u53d6\u5f97<\/p>\r\n<div class=\"hcb_wrap\">\r\n<pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>query = f\"'{folder_id}' in parents and mimeType = 'application\/vnd.google-apps.document'\"\r\nresults = drive_service.files().list(q=query, spaces='drive').execute()\r\n\r\ndicts = {}\r\nitems = results.get('files', [])\r\nfor item in items:\r\n normalized_name = unicodedata.normalize('NFKC', item['name']) # \u6b63\u898f\u5316\r\n dicts[normalized_name] = item['id']\r\n\r\nfile_id = []\r\nfile_name = []\r\nfor keys in dicts:\r\n if keys in files:\r\n  file_id.append(dicts[keys])\r\n  file_name.append(keys)\r\n\r\n# \u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306e\u30c6\u30ad\u30b9\u30c8\u90e8\u5206\u306e\u53d6\u5f97\r\nknowledge = {}\r\nfor id,name in zip(file_id,file_name):\r\n document = docs_service.documents().get(documentId=id).execute()\r\n content = document.get('body').get('content')\r\n\r\ntexts = []\r\nfor elem in content:\r\n if 'paragraph' in elem:\r\n  for para_elem in elem['paragraph']['elements']:\r\n   text = para_elem['textRun']['content']\r\n   if \"\\n\" != text:\r\n    texts.append(text)\r\ntexts = texts[6:]\r\nknowledge[f\"{name}\"] = texts<\/code><\/pre>\r\n<\/div>\r\n<p>.txt\u306b\u5909\u63db<\/p>\r\n<div class=\"hcb_wrap\">\r\n<pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>def sanitize_filename(file_name):\r\n invalid_chars = r'[&lt;&gt;\uff1a:\"\/\\\\|?*]'\r\n\r\n if re.search(invalid_chars, file_name):\r\n  sanitized_name = re.sub(invalid_chars, '_', file_name)\r\n  sanitized_name = sanitized_name.strip()\r\n  return sanitized_name\r\n else:\r\n  return file_name\r\n\r\nfor file_name,file_text in zip(knowledge.keys(),knowledge.values()):\r\n if file_text:\r\n  sanitized_file_name = sanitize_filename(file_name)\r\n  file_path = os.path.join(path, \"output\", f\"{sanitized_file_name}.txt\")\r\n  with open(file_path,\"w\", encoding=\"utf-8\") as file:\r\n  file.write(file_text[0])<\/code><\/pre>\r\n<\/div>\r\n<p>\u3053\u306e\u3088\u3046\u306a\u624b\u9806\u3092\u8e0f\u3080\u3053\u3068\u3067\u3001Google Document\u30d5\u30a1\u30a4\u30eb\u3092\u30c6\u30ad\u30b9\u30c8\u30d5\u30a1\u30a4\u30eb\u306b\u5909\u63db\u3059\u308b\u3053\u3068\u304c\u53ef\u80fd\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n\n<h2>\u304a\u308f\u308a\u306b<\/h2>\n<p>Colab\u3068Google\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u3092\u9023\u643a\u3055\u305b\u308b\u3053\u3068\u3067\u3001<strong>\u30af\u30e9\u30a6\u30c9\u4e0a\u306e\u6587\u66f8\u30c7\u30fc\u30bf\u3092\u30d7\u30ed\u30b0\u30e9\u30e0\u3067\u76f4\u63a5\u64cd\u4f5c<\/strong>\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3059\u3002\u4eca\u56de\u7d39\u4ecb\u3057\u305f\u65b9\u6cd5\u3092\u4f7f\u3048\u3070\u3001Google\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u5185\u306e\u60c5\u5831\u3092\u30c6\u30ad\u30b9\u30c8\u51e6\u7406\u306e\u7d20\u6750\u3068\u3057\u3066\u6d3b\u7528\u3057\u305f\u308a\u3001\u6a5f\u68b0\u5b66\u7fd2\u306e\u524d\u51e6\u7406\u306b\u5f79\u7acb\u3066\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\r\n<p>\u4eca\u5f8c\u306f\u3053\u306e\u65b9\u6cd5\u3092\u5fdc\u7528\u3057\u3066\u3001\u8907\u6570\u306e\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u3092\u4e00\u62ec\u3067\u51e6\u7406\u3057\u305f\u308a\u3001\u53d6\u5f97\u3057\u305f\u5185\u5bb9\u306b\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\uff08NLP\uff09\u3092\u52a0\u3048\u308b\u306a\u3069\u3001\u3055\u3089\u306b\u9ad8\u5ea6\u306a\u5206\u6790\u306b\u3082\u3064\u306a\u3052\u3089\u308c\u308b\u3067\u3057\u3087\u3046\u3002<\/p>\r\n<p><!-- notionvc: f3c89264-2267-4242-a675-2ee15b4711ac --><\/p>","protected":false},"excerpt":{"rendered":"<p>\u3053\u306e\u8a18\u4e8b\u3067\u306f\u3001Google Colaboratory\u3092\u4f7f\u3063\u3066Google\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306e\u5185\u5bb9\u3092\u81ea\u52d5\u53d6\u5f97\u3057\u3001\u305d\u306e\u30c6\u30ad\u30b9\u30c8\u3092.txt\u30d5\u30a1\u30a4\u30eb\u306b\u5909\u63db\u30fb\u4fdd\u5b58\u3059\u308b\u65b9\u6cd5\u3092\u3054\u7d39\u4ecb\u3057\u307e\u3059\u3002API\u9023\u643a\u306e\u57fa\u672c\u3082\u5b66\u3079\u308b\u305f\u3081\u3001\u521d\u3081\u3066Google  [&hellip;]<\/p>\n","protected":false},"author":19,"featured_media":3123,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","swell_btn_cv_data":"","footnotes":"","_wp_rev_ctl_limit":""},"categories":[1249],"tags":[367,972,973,971,331],"class_list":["post-6699","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-knowledge","tag-google-colabratory","tag-google-docs","tag-google-document","tag-google-drive","tag-python"],"_links":{"self":[{"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/posts\/6699","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/users\/19"}],"replies":[{"embeddable":true,"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/comments?post=6699"}],"version-history":[{"count":0,"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/posts\/6699\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/media\/3123"}],"wp:attachment":[{"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/media?parent=6699"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/categories?post=6699"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/since2020.jp\/media\/wp-json\/wp\/v2\/tags?post=6699"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}