{ "cells": [ { "cell_type": "markdown", "id": "41fb9eda-8d3e-43e6-ac29-ed5bb545f0e3", "metadata": {}, "source": [ "# データサイエンス100本ノック 31~40" ] }, { "cell_type": "code", "execution_count": 3, "id": "bfa0c462-fa3a-443e-b011-1094b8e3d629", "metadata": {}, "outputs": [], "source": [ "import polars as pl\n", "from helper.polars import load_100knocks_data\n", "pl.Config.set_fmt_str_lengths(100)\n", "df_customer, df_category, df_product, df_receipt, df_store, df_geocode = load_100knocks_data()" ] }, { "cell_type": "markdown", "id": "cdd48c24-9412-40e8-b66a-65b9c3070478", "metadata": {}, "source": [ "## P-031\n", "\n", "レシート明細データ(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の標準偏差を計算し、降順で5件表示せよ。" ] }, { "cell_type": "code", "execution_count": 4, "id": "885637b6-b1c8-4ab2-bd31-8d715c63682e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (5, 2)
store_cdstd_amount
strf64
"S13052"663.391816
"S14011"553.456916
"S14034"544.903736
"S13001"543.536561
"S13015"543.409938
" ], "text/plain": [ "shape: (5, 2)\n", "┌──────────┬────────────┐\n", "│ store_cd ┆ std_amount │\n", "│ --- ┆ --- │\n", "│ str ┆ f64 │\n", "╞══════════╪════════════╡\n", "│ S13052 ┆ 663.391816 │\n", "│ S14011 ┆ 553.456916 │\n", "│ S14034 ┆ 544.903736 │\n", "│ S13001 ┆ 543.536561 │\n", "│ S13015 ┆ 543.409938 │\n", "└──────────┴────────────┘" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", "df_receipt\n", ".group_by('store_cd')\n", ".agg(\n", " std_amount=pl.col.amount.std(ddof=0)\n", ")\n", ".sort(by='std_amount', descending=True)\n", ".head()\n", ")" ] }, { "cell_type": "markdown", "id": "484894b8-71a5-424e-a5ef-622a5222f29b", "metadata": {}, "source": [ "## P-032\n", "\n", "レシート明細データ(df_receipt)の売上金額(amount)について、25%刻みでパーセンタイル値を求めよ。" ] }, { "cell_type": "code", "execution_count": 5, "id": "5760fc3d-e854-4c53-a5da-2d5db1e1174d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 4)
amount_25peramount_50peramount_75peramount_100per
f64f64f64f64
102.0170.0288.010925.0
" ], "text/plain": [ "shape: (1, 4)\n", "┌──────────────┬──────────────┬──────────────┬───────────────┐\n", "│ amount_25per ┆ amount_50per ┆ amount_75per ┆ amount_100per │\n", "│ --- ┆ --- ┆ --- ┆ --- │\n", "│ f64 ┆ f64 ┆ f64 ┆ f64 │\n", "╞══════════════╪══════════════╪══════════════╪═══════════════╡\n", "│ 102.0 ┆ 170.0 ┆ 288.0 ┆ 10925.0 │\n", "└──────────────┴──────────────┴──────────────┴───────────────┘" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_receipt.select(\n", " [pl.col.amount\n", " .quantile(per / 100)\n", " .alias(f'amount_{per}per') \n", " for per in [25, 50, 75, 100]\n", " ]\n", ")" ] }, { "cell_type": "markdown", "id": "5d95df60-0825-4a2c-b482-4f21ec857f27", "metadata": {}, "source": [ "## P-033\n", "\n", "レシート明細データ(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の平均を計算し、330以上のものを抽出せよ。" ] }, { "cell_type": "code", "execution_count": 6, "id": "0d4cc62a-a88c-4fe9-81af-fecdb1bb7825", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (13, 2)
store_cdavg_amount
strf64
"S13003"350.915519
"S13020"337.879932
"S14026"332.340588
"S13015"351.11196
"S14045"330.082073
"S14010"348.791262
"S12013"330.19413
"S13001"348.470386
"S14047"330.077073
"S13052"402.86747
" ], "text/plain": [ "shape: (13, 2)\n", "┌──────────┬────────────┐\n", "│ store_cd ┆ avg_amount │\n", "│ --- ┆ --- │\n", "│ str ┆ f64 │\n", "╞══════════╪════════════╡\n", "│ S13003 ┆ 350.915519 │\n", "│ S13020 ┆ 337.879932 │\n", "│ S14026 ┆ 332.340588 │\n", "│ S13015 ┆ 351.11196 │\n", "│ S14045 ┆ 330.082073 │\n", "│ … ┆ … │\n", "│ S14010 ┆ 348.791262 │\n", "│ S12013 ┆ 330.19413 │\n", "│ S13001 ┆ 348.470386 │\n", "│ S14047 ┆ 330.077073 │\n", "│ S13052 ┆ 402.86747 │\n", "└──────────┴────────────┘" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", "df_receipt\n", ".group_by('store_cd', maintain_order=True)\n", ".agg(\n", " avg_amount=pl.col.amount.mean()\n", ")\n", ".filter(pl.col.avg_amount >= 330)\n", ")" ] }, { "cell_type": "markdown", "id": "0556aeab-f537-44bd-ae88-6d7c371913a8", "metadata": {}, "source": [ "## P-034\n", "\n", "レシート明細データ(df_receipt)に対し、顧客ID(customer_id)ごとに売上金額(amount)を合計して全顧客の平均を求めよ。ただし、顧客IDが\"Z\"から始まるものは非会員を表すため、除外して計算すること。" ] }, { "cell_type": "code", "execution_count": 7, "id": "b52a38ad-48ea-406e-aee6-aea45e0ba72c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2547.742234529256" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", "df_receipt\n", ".filter(\n", " pl.col.customer_id.str.starts_with('Z').not_()\n", ")\n", ".group_by('customer_id')\n", ".agg(\n", " pl.col.amount.sum()\n", ")\n", ".select(pl.col.amount.mean())\n", ".item()\n", ")" ] }, { "cell_type": "markdown", "id": "1da52788-2cef-4119-b8d6-89c7dfed9d40", "metadata": {}, "source": [ "## P-035\n", "\n", "レシート明細データ(df_receipt)に対し、顧客ID(customer_id)ごとに売上金額(amount)を合計して全顧客の平均を求め、平均以上に買い物をしている顧客を抽出し、10件表示せよ。ただし、顧客IDが\"Z\"から始まるものは非会員を表すため、除外して計算すること。" ] }, { "cell_type": "code", "execution_count": 8, "id": "488e340a-aaed-4123-b5a4-891ac113f78c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (10, 2)
customer_idsum_amount
stri64
"CS003515000047"4424
"CS006515000083"7115
"CS019415000147"12764
"CS034214000029"4570
"CS003505000023"3778
"CS010415000134"3442
"CS021514000045"9741
"CS006414000037"6416
"CS045615000002"2888
"CS022515000115"7146
" ], "text/plain": [ "shape: (10, 2)\n", "┌────────────────┬────────────┐\n", "│ customer_id ┆ sum_amount │\n", "│ --- ┆ --- │\n", "│ str ┆ i64 │\n", "╞════════════════╪════════════╡\n", "│ CS003515000047 ┆ 4424 │\n", "│ CS006515000083 ┆ 7115 │\n", "│ CS019415000147 ┆ 12764 │\n", "│ CS034214000029 ┆ 4570 │\n", "│ CS003505000023 ┆ 3778 │\n", "│ CS010415000134 ┆ 3442 │\n", "│ CS021514000045 ┆ 9741 │\n", "│ CS006414000037 ┆ 6416 │\n", "│ CS045615000002 ┆ 2888 │\n", "│ CS022515000115 ┆ 7146 │\n", "└────────────────┴────────────┘" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", "df_receipt\n", ".lazy()\n", ".filter(\n", " pl.col.customer_id.str.starts_with('Z').not_()\n", ")\n", ".group_by('customer_id')\n", ".agg(\n", " sum_amount = pl.col.amount.sum()\n", ")\n", ".filter(\n", " pl.col.sum_amount >= pl.col.sum_amount.mean()\n", ")\n", ".collect()\n", ".head(10)\n", ")" ] }, { "cell_type": "markdown", "id": "da11daa5-cc76-4946-b047-08cc1b013646", "metadata": {}, "source": [ "## P-036\n", "\n", "レシート明細データ(df_receipt)と店舗データ(df_store)を内部結合し、レシート明細データの全項目と店舗データの店舗名(store_name)を10件表示せよ。" ] }, { "cell_type": "code", "execution_count": 9, "id": "5b283cca-cd70-42f5-998f-6dc6952c5dd6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (10, 10)
sales_ymdsales_epochstore_cdreceipt_noreceipt_sub_nocustomer_idproduct_cdquantityamountstore_name
i64i64stri64i64strstri64i64str
201811031541203200"S14006"1121"CS006214000001""P070305012"1158"葛が谷店"
201811181542499200"S13008"11322"CS008415000097""P070701017"181"成城店"
201707121499817600"S14028"11021"CS028414000014""P060101005"1170"二ツ橋店"
201902051549324800"S14042"11321"ZZ000000000000""P050301001"125"新山下店"
201808211534809600"S14025"11022"CS025415000050""P060102007"190"大和店"
201906051559692800"S13003"11121"CS003515000195""P050102002"1138"狛江店"
201812051543968000"S14024"11022"CS024514000042""P080101005"130"三田店"
201909221569110400"S14040"11021"CS040415000178""P070501004"1128"長津田店"
201705041493856000"S13020"11122"ZZ000000000000""P071302010"1770"十条仲原店"
201910101570665600"S14027"11021"CS027514000015""P071101003"1680"南藤沢店"
" ], "text/plain": [ "shape: (10, 10)\n", "┌───────────┬─────────────┬──────────┬────────────┬───┬────────────┬──────────┬────────┬────────────┐\n", "│ sales_ymd ┆ sales_epoch ┆ store_cd ┆ receipt_no ┆ … ┆ product_cd ┆ quantity ┆ amount ┆ store_name │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ i64 ┆ i64 ┆ str ┆ i64 ┆ ┆ str ┆ i64 ┆ i64 ┆ str │\n", "╞═══════════╪═════════════╪══════════╪════════════╪═══╪════════════╪══════════╪════════╪════════════╡\n", "│ 20181103 ┆ 1541203200 ┆ S14006 ┆ 112 ┆ … ┆ P070305012 ┆ 1 ┆ 158 ┆ 葛が谷店 │\n", "│ 20181118 ┆ 1542499200 ┆ S13008 ┆ 1132 ┆ … ┆ P070701017 ┆ 1 ┆ 81 ┆ 成城店 │\n", "│ 20170712 ┆ 1499817600 ┆ S14028 ┆ 1102 ┆ … ┆ P060101005 ┆ 1 ┆ 170 ┆ 二ツ橋店 │\n", "│ 20190205 ┆ 1549324800 ┆ S14042 ┆ 1132 ┆ … ┆ P050301001 ┆ 1 ┆ 25 ┆ 新山下店 │\n", "│ 20180821 ┆ 1534809600 ┆ S14025 ┆ 1102 ┆ … ┆ P060102007 ┆ 1 ┆ 90 ┆ 大和店 │\n", "│ 20190605 ┆ 1559692800 ┆ S13003 ┆ 1112 ┆ … ┆ P050102002 ┆ 1 ┆ 138 ┆ 狛江店 │\n", "│ 20181205 ┆ 1543968000 ┆ S14024 ┆ 1102 ┆ … ┆ P080101005 ┆ 1 ┆ 30 ┆ 三田店 │\n", "│ 20190922 ┆ 1569110400 ┆ S14040 ┆ 1102 ┆ … ┆ P070501004 ┆ 1 ┆ 128 ┆ 長津田店 │\n", "│ 20170504 ┆ 1493856000 ┆ S13020 ┆ 1112 ┆ … ┆ P071302010 ┆ 1 ┆ 770 ┆ 十条仲原店 │\n", "│ 20191010 ┆ 1570665600 ┆ S14027 ┆ 1102 ┆ … ┆ P071101003 ┆ 1 ┆ 680 ┆ 南藤沢店 │\n", "└───────────┴─────────────┴──────────┴────────────┴───┴────────────┴──────────┴────────┴────────────┘" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", "df_receipt\n", ".join(\n", " df_store.select('store_cd', 'store_name'), \n", " on='store_cd'\n", ")\n", ".head(10)\n", ")" ] }, { "cell_type": "markdown", "id": "4f60f07b-98c6-405a-b4a5-9292176535c3", "metadata": {}, "source": [ "## P-037\n", "\n", "商品データ(df_product)とカテゴリデータ(df_category)を内部結合し、商品データの全項目とカテゴリデータのカテゴリ小区分名(category_small_name)を10件表示せよ。" ] }, { "cell_type": "code", "execution_count": 10, "id": "13c3f47c-05ec-4387-a2ec-e64bff66e44b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (10, 7)
product_cdcategory_major_cdcategory_medium_cdcategory_small_cdunit_priceunit_costcategory_small_name
strstrstrstri64i64str
"P040101001""04""0401""040101"198149"弁当類"
"P040101002""04""0401""040101"218164"弁当類"
"P040101003""04""0401""040101"230173"弁当類"
"P040101004""04""0401""040101"248186"弁当類"
"P040101005""04""0401""040101"268201"弁当類"
"P040101006""04""0401""040101"298224"弁当類"
"P040101007""04""0401""040101"338254"弁当類"
"P040101008""04""0401""040101"420315"弁当類"
"P040101009""04""0401""040101"498374"弁当類"
"P040101010""04""0401""040101"580435"弁当類"
" ], "text/plain": [ "shape: (10, 7)\n", "┌────────────┬───────────────────┬────────────────────┬───────────────────┬────────────┬───────────┬───────────────────┐\n", "│ product_cd ┆ category_major_cd ┆ category_medium_cd ┆ category_small_cd ┆ unit_price ┆ unit_cost ┆ category_small_na │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ me │\n", "│ str ┆ str ┆ str ┆ str ┆ i64 ┆ i64 ┆ --- │\n", "│ ┆ ┆ ┆ ┆ ┆ ┆ str │\n", "╞════════════╪═══════════════════╪════════════════════╪═══════════════════╪════════════╪═══════════╪═══════════════════╡\n", "│ P040101001 ┆ 04 ┆ 0401 ┆ 040101 ┆ 198 ┆ 149 ┆ 弁当類 │\n", "│ P040101002 ┆ 04 ┆ 0401 ┆ 040101 ┆ 218 ┆ 164 ┆ 弁当類 │\n", "│ P040101003 ┆ 04 ┆ 0401 ┆ 040101 ┆ 230 ┆ 173 ┆ 弁当類 │\n", "│ P040101004 ┆ 04 ┆ 0401 ┆ 040101 ┆ 248 ┆ 186 ┆ 弁当類 │\n", "│ P040101005 ┆ 04 ┆ 0401 ┆ 040101 ┆ 268 ┆ 201 ┆ 弁当類 │\n", "│ P040101006 ┆ 04 ┆ 0401 ┆ 040101 ┆ 298 ┆ 224 ┆ 弁当類 │\n", "│ P040101007 ┆ 04 ┆ 0401 ┆ 040101 ┆ 338 ┆ 254 ┆ 弁当類 │\n", "│ P040101008 ┆ 04 ┆ 0401 ┆ 040101 ┆ 420 ┆ 315 ┆ 弁当類 │\n", "│ P040101009 ┆ 04 ┆ 0401 ┆ 040101 ┆ 498 ┆ 374 ┆ 弁当類 │\n", "│ P040101010 ┆ 04 ┆ 0401 ┆ 040101 ┆ 580 ┆ 435 ┆ 弁当類 │\n", "└────────────┴───────────────────┴────────────────────┴───────────────────┴────────────┴───────────┴───────────────────┘" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", "df_product\n", ".join(\n", " df_category.select(\n", " 'category_small_cd', \n", " 'category_small_name'\n", " ),\n", " on='category_small_cd'\n", ")\n", ".head(10)\n", ")" ] }, { "cell_type": "markdown", "id": "1d71f22e-82dd-4535-9ec6-b46453862f5e", "metadata": {}, "source": [ "## P-038\n", "\n", "顧客データ(df_customer)とレシート明細データ(df_receipt)から、顧客ごとの売上金額合計を求め、10件表示せよ。ただし、売上実績がない顧客については売上金額を0として表示させること。また、顧客は性別コード(gender_cd)が女性(1)であるものを対象とし、非会員(顧客IDが\"Z\"から始まるもの)は除外すること。" ] }, { "cell_type": "code", "execution_count": 11, "id": "8804455b-973d-4148-b725-cd49c3609434", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (10, 2)
customer_idsum_amount
stri64
"CS021313000114"0
"CS031415000172"5088
"CS028811000001"0
"CS001215000145"875
"CS015414000103"3122
"CS033513000180"868
"CS035614000014"0
"CS011215000048"3444
"CS009413000079"0
"CS040412000191"210
" ], "text/plain": [ "shape: (10, 2)\n", "┌────────────────┬────────────┐\n", "│ customer_id ┆ sum_amount │\n", "│ --- ┆ --- │\n", "│ str ┆ i64 │\n", "╞════════════════╪════════════╡\n", "│ CS021313000114 ┆ 0 │\n", "│ CS031415000172 ┆ 5088 │\n", "│ CS028811000001 ┆ 0 │\n", "│ CS001215000145 ┆ 875 │\n", "│ CS015414000103 ┆ 3122 │\n", "│ CS033513000180 ┆ 868 │\n", "│ CS035614000014 ┆ 0 │\n", "│ CS011215000048 ┆ 3444 │\n", "│ CS009413000079 ┆ 0 │\n", "│ CS040412000191 ┆ 210 │\n", "└────────────────┴────────────┘" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", "df_customer\n", ".lazy()\n", ".filter(\n", " (pl.col.gender_cd == \"1\") & \n", " (pl.col.customer_id.str.starts_with('Z').not_())\n", ")\n", ".select('customer_id')\n", ".join(\n", " df_receipt\n", " .lazy()\n", " .group_by('customer_id')\n", " .agg(\n", " sum_amount = pl.col.amount.sum()\n", " ),\n", " on='customer_id',\n", " how='left'\n", ")\n", ".with_columns(\n", " pl.col.sum_amount.fill_null(0)\n", ")\n", ".head(10)\n", ".collect()\n", ")" ] }, { "cell_type": "markdown", "id": "a9436c54-16ab-4bef-99f9-5bdbe397388e", "metadata": {}, "source": [ "## P-039\n", "\n", "レシート明細データ(df_receipt)から、売上日数の多い顧客の上位20件を抽出したデータと、売上金額合計の多い顧客の上位20件を抽出したデータをそれぞれ作成し、さらにその2つを完全外部結合せよ。ただし、非会員(顧客IDが\"Z\"から始まるもの)は除外すること。" ] }, { "cell_type": "code", "execution_count": 14, "id": "92f8e801-e9c1-4bcc-8e26-115e5cf03a92", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (34, 3)
customer_idcome_daysbuy_amount
stru32i64
"CS017415000097"2023086
"CS015415000185"2220153
"CS031414000051"1919202
"CS028415000007"2119127
"CS001605000009"null18925
"CS014214000023"19null
"CS022515000028"18null
"CS010214000002"21null
"CS039414000052"19null
"CS030214000008"18null
" ], "text/plain": [ "shape: (34, 3)\n", "┌────────────────┬───────────┬────────────┐\n", "│ customer_id ┆ come_days ┆ buy_amount │\n", "│ --- ┆ --- ┆ --- │\n", "│ str ┆ u32 ┆ i64 │\n", "╞════════════════╪═══════════╪════════════╡\n", "│ CS017415000097 ┆ 20 ┆ 23086 │\n", "│ CS015415000185 ┆ 22 ┆ 20153 │\n", "│ CS031414000051 ┆ 19 ┆ 19202 │\n", "│ CS028415000007 ┆ 21 ┆ 19127 │\n", "│ CS001605000009 ┆ null ┆ 18925 │\n", "│ … ┆ … ┆ … │\n", "│ CS014214000023 ┆ 19 ┆ null │\n", "│ CS022515000028 ┆ 18 ┆ null │\n", "│ CS010214000002 ┆ 21 ┆ null │\n", "│ CS039414000052 ┆ 19 ┆ null │\n", "│ CS030214000008 ┆ 18 ┆ null │\n", "└────────────────┴───────────┴────────────┘" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_data = df_receipt.filter(\n", " pl.col.customer_id.str.starts_with('Z').not_()\n", ")\n", "\n", "group = df_data.group_by('customer_id')\n", "\n", "df_cnt = (\n", " group\n", " .agg(\n", " come_days = pl.col.sales_ymd.n_unique()\n", " )\n", " .top_k(20, by='come_days')\n", ")\n", "\n", "df_sum = (\n", " group\n", " .agg(\n", " buy_amount = pl.col.amount.sum()\n", " )\n", " .top_k(20, by='buy_amount')\n", ")\n", "\n", "df_cnt.join(df_sum, on='customer_id', how='full', coalesce=True)" ] }, { "cell_type": "markdown", "id": "f12d65df-c76c-4060-b882-a0dfc3616c59", "metadata": {}, "source": [ "## P-040\n", "\n", "全ての店舗と全ての商品を組み合わせたデータを作成したい。店舗データ(df_store)と商品データ(df_product)を直積し、件数を計算せよ。" ] }, { "cell_type": "code", "execution_count": 13, "id": "a062bd63-fdde-49f8-a29a-abb89365115c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "531590" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", "df_store\n", ".join(df_product, how='cross')\n", ".select(pl.len())\n", ".item()\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 5 }