{
"cells": [
{
"cell_type": "markdown",
"id": "41fb9eda-8d3e-43e6-ac29-ed5bb545f0e3",
"metadata": {},
"source": [
"# データサイエンス100本ノック 31~40"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bfa0c462-fa3a-443e-b011-1094b8e3d629",
"metadata": {},
"outputs": [],
"source": [
"import polars as pl\n",
"from helper.polars import load_100knocks_data\n",
"pl.Config.set_fmt_str_lengths(100)\n",
"df_customer, df_category, df_product, df_receipt, df_store, df_geocode = load_100knocks_data()"
]
},
{
"cell_type": "markdown",
"id": "cdd48c24-9412-40e8-b66a-65b9c3070478",
"metadata": {},
"source": [
"## P-031\n",
"\n",
"レシート明細データ(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の標準偏差を計算し、降順で5件表示せよ。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "885637b6-b1c8-4ab2-bd31-8d715c63682e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
shape: (5, 2)store_cd | std_amount |
---|
str | f64 |
"S13052" | 663.391816 |
"S14011" | 553.456916 |
"S14034" | 544.903736 |
"S13001" | 543.536561 |
"S13015" | 543.409938 |
"
],
"text/plain": [
"shape: (5, 2)\n",
"┌──────────┬────────────┐\n",
"│ store_cd ┆ std_amount │\n",
"│ --- ┆ --- │\n",
"│ str ┆ f64 │\n",
"╞══════════╪════════════╡\n",
"│ S13052 ┆ 663.391816 │\n",
"│ S14011 ┆ 553.456916 │\n",
"│ S14034 ┆ 544.903736 │\n",
"│ S13001 ┆ 543.536561 │\n",
"│ S13015 ┆ 543.409938 │\n",
"└──────────┴────────────┘"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".group_by('store_cd')\n",
".agg(\n",
" std_amount=pl.col.amount.std(ddof=0)\n",
")\n",
".sort(by='std_amount', descending=True)\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "484894b8-71a5-424e-a5ef-622a5222f29b",
"metadata": {},
"source": [
"## P-032\n",
"\n",
"レシート明細データ(df_receipt)の売上金額(amount)について、25%刻みでパーセンタイル値を求めよ。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "5760fc3d-e854-4c53-a5da-2d5db1e1174d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (1, 4)amount_25per | amount_50per | amount_75per | amount_100per |
---|
f64 | f64 | f64 | f64 |
102.0 | 170.0 | 288.0 | 10925.0 |
"
],
"text/plain": [
"shape: (1, 4)\n",
"┌──────────────┬──────────────┬──────────────┬───────────────┐\n",
"│ amount_25per ┆ amount_50per ┆ amount_75per ┆ amount_100per │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ f64 ┆ f64 ┆ f64 ┆ f64 │\n",
"╞══════════════╪══════════════╪══════════════╪═══════════════╡\n",
"│ 102.0 ┆ 170.0 ┆ 288.0 ┆ 10925.0 │\n",
"└──────────────┴──────────────┴──────────────┴───────────────┘"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_receipt.select(\n",
" [pl.col.amount\n",
" .quantile(per / 100)\n",
" .alias(f'amount_{per}per') \n",
" for per in [25, 50, 75, 100]\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5d95df60-0825-4a2c-b482-4f21ec857f27",
"metadata": {},
"source": [
"## P-033\n",
"\n",
"レシート明細データ(df_receipt)に対し、店舗コード(store_cd)ごとに売上金額(amount)の平均を計算し、330以上のものを抽出せよ。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0d4cc62a-a88c-4fe9-81af-fecdb1bb7825",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (13, 2)store_cd | avg_amount |
---|
str | f64 |
"S13003" | 350.915519 |
"S13020" | 337.879932 |
"S14026" | 332.340588 |
"S13015" | 351.11196 |
"S14045" | 330.082073 |
… | … |
"S14010" | 348.791262 |
"S12013" | 330.19413 |
"S13001" | 348.470386 |
"S14047" | 330.077073 |
"S13052" | 402.86747 |
"
],
"text/plain": [
"shape: (13, 2)\n",
"┌──────────┬────────────┐\n",
"│ store_cd ┆ avg_amount │\n",
"│ --- ┆ --- │\n",
"│ str ┆ f64 │\n",
"╞══════════╪════════════╡\n",
"│ S13003 ┆ 350.915519 │\n",
"│ S13020 ┆ 337.879932 │\n",
"│ S14026 ┆ 332.340588 │\n",
"│ S13015 ┆ 351.11196 │\n",
"│ S14045 ┆ 330.082073 │\n",
"│ … ┆ … │\n",
"│ S14010 ┆ 348.791262 │\n",
"│ S12013 ┆ 330.19413 │\n",
"│ S13001 ┆ 348.470386 │\n",
"│ S14047 ┆ 330.077073 │\n",
"│ S13052 ┆ 402.86747 │\n",
"└──────────┴────────────┘"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".group_by('store_cd', maintain_order=True)\n",
".agg(\n",
" avg_amount=pl.col.amount.mean()\n",
")\n",
".filter(pl.col.avg_amount >= 330)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0556aeab-f537-44bd-ae88-6d7c371913a8",
"metadata": {},
"source": [
"## P-034\n",
"\n",
"レシート明細データ(df_receipt)に対し、顧客ID(customer_id)ごとに売上金額(amount)を合計して全顧客の平均を求めよ。ただし、顧客IDが\"Z\"から始まるものは非会員を表すため、除外して計算すること。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b52a38ad-48ea-406e-aee6-aea45e0ba72c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2547.742234529256"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".filter(\n",
" pl.col.customer_id.str.starts_with('Z').not_()\n",
")\n",
".group_by('customer_id')\n",
".agg(\n",
" pl.col.amount.sum()\n",
")\n",
".select(pl.col.amount.mean())\n",
".item()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "1da52788-2cef-4119-b8d6-89c7dfed9d40",
"metadata": {},
"source": [
"## P-035\n",
"\n",
"レシート明細データ(df_receipt)に対し、顧客ID(customer_id)ごとに売上金額(amount)を合計して全顧客の平均を求め、平均以上に買い物をしている顧客を抽出し、10件表示せよ。ただし、顧客IDが\"Z\"から始まるものは非会員を表すため、除外して計算すること。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "488e340a-aaed-4123-b5a4-891ac113f78c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 2)customer_id | sum_amount |
---|
str | i64 |
"CS003515000047" | 4424 |
"CS006515000083" | 7115 |
"CS019415000147" | 12764 |
"CS034214000029" | 4570 |
"CS003505000023" | 3778 |
"CS010415000134" | 3442 |
"CS021514000045" | 9741 |
"CS006414000037" | 6416 |
"CS045615000002" | 2888 |
"CS022515000115" | 7146 |
"
],
"text/plain": [
"shape: (10, 2)\n",
"┌────────────────┬────────────┐\n",
"│ customer_id ┆ sum_amount │\n",
"│ --- ┆ --- │\n",
"│ str ┆ i64 │\n",
"╞════════════════╪════════════╡\n",
"│ CS003515000047 ┆ 4424 │\n",
"│ CS006515000083 ┆ 7115 │\n",
"│ CS019415000147 ┆ 12764 │\n",
"│ CS034214000029 ┆ 4570 │\n",
"│ CS003505000023 ┆ 3778 │\n",
"│ CS010415000134 ┆ 3442 │\n",
"│ CS021514000045 ┆ 9741 │\n",
"│ CS006414000037 ┆ 6416 │\n",
"│ CS045615000002 ┆ 2888 │\n",
"│ CS022515000115 ┆ 7146 │\n",
"└────────────────┴────────────┘"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".lazy()\n",
".filter(\n",
" pl.col.customer_id.str.starts_with('Z').not_()\n",
")\n",
".group_by('customer_id')\n",
".agg(\n",
" sum_amount = pl.col.amount.sum()\n",
")\n",
".filter(\n",
" pl.col.sum_amount >= pl.col.sum_amount.mean()\n",
")\n",
".collect()\n",
".head(10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "da11daa5-cc76-4946-b047-08cc1b013646",
"metadata": {},
"source": [
"## P-036\n",
"\n",
"レシート明細データ(df_receipt)と店舗データ(df_store)を内部結合し、レシート明細データの全項目と店舗データの店舗名(store_name)を10件表示せよ。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "5b283cca-cd70-42f5-998f-6dc6952c5dd6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 10)sales_ymd | sales_epoch | store_cd | receipt_no | receipt_sub_no | customer_id | product_cd | quantity | amount | store_name |
---|
i64 | i64 | str | i64 | i64 | str | str | i64 | i64 | str |
20181103 | 1541203200 | "S14006" | 112 | 1 | "CS006214000001" | "P070305012" | 1 | 158 | "葛が谷店" |
20181118 | 1542499200 | "S13008" | 1132 | 2 | "CS008415000097" | "P070701017" | 1 | 81 | "成城店" |
20170712 | 1499817600 | "S14028" | 1102 | 1 | "CS028414000014" | "P060101005" | 1 | 170 | "二ツ橋店" |
20190205 | 1549324800 | "S14042" | 1132 | 1 | "ZZ000000000000" | "P050301001" | 1 | 25 | "新山下店" |
20180821 | 1534809600 | "S14025" | 1102 | 2 | "CS025415000050" | "P060102007" | 1 | 90 | "大和店" |
20190605 | 1559692800 | "S13003" | 1112 | 1 | "CS003515000195" | "P050102002" | 1 | 138 | "狛江店" |
20181205 | 1543968000 | "S14024" | 1102 | 2 | "CS024514000042" | "P080101005" | 1 | 30 | "三田店" |
20190922 | 1569110400 | "S14040" | 1102 | 1 | "CS040415000178" | "P070501004" | 1 | 128 | "長津田店" |
20170504 | 1493856000 | "S13020" | 1112 | 2 | "ZZ000000000000" | "P071302010" | 1 | 770 | "十条仲原店" |
20191010 | 1570665600 | "S14027" | 1102 | 1 | "CS027514000015" | "P071101003" | 1 | 680 | "南藤沢店" |
"
],
"text/plain": [
"shape: (10, 10)\n",
"┌───────────┬─────────────┬──────────┬────────────┬───┬────────────┬──────────┬────────┬────────────┐\n",
"│ sales_ymd ┆ sales_epoch ┆ store_cd ┆ receipt_no ┆ … ┆ product_cd ┆ quantity ┆ amount ┆ store_name │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ str ┆ i64 ┆ ┆ str ┆ i64 ┆ i64 ┆ str │\n",
"╞═══════════╪═════════════╪══════════╪════════════╪═══╪════════════╪══════════╪════════╪════════════╡\n",
"│ 20181103 ┆ 1541203200 ┆ S14006 ┆ 112 ┆ … ┆ P070305012 ┆ 1 ┆ 158 ┆ 葛が谷店 │\n",
"│ 20181118 ┆ 1542499200 ┆ S13008 ┆ 1132 ┆ … ┆ P070701017 ┆ 1 ┆ 81 ┆ 成城店 │\n",
"│ 20170712 ┆ 1499817600 ┆ S14028 ┆ 1102 ┆ … ┆ P060101005 ┆ 1 ┆ 170 ┆ 二ツ橋店 │\n",
"│ 20190205 ┆ 1549324800 ┆ S14042 ┆ 1132 ┆ … ┆ P050301001 ┆ 1 ┆ 25 ┆ 新山下店 │\n",
"│ 20180821 ┆ 1534809600 ┆ S14025 ┆ 1102 ┆ … ┆ P060102007 ┆ 1 ┆ 90 ┆ 大和店 │\n",
"│ 20190605 ┆ 1559692800 ┆ S13003 ┆ 1112 ┆ … ┆ P050102002 ┆ 1 ┆ 138 ┆ 狛江店 │\n",
"│ 20181205 ┆ 1543968000 ┆ S14024 ┆ 1102 ┆ … ┆ P080101005 ┆ 1 ┆ 30 ┆ 三田店 │\n",
"│ 20190922 ┆ 1569110400 ┆ S14040 ┆ 1102 ┆ … ┆ P070501004 ┆ 1 ┆ 128 ┆ 長津田店 │\n",
"│ 20170504 ┆ 1493856000 ┆ S13020 ┆ 1112 ┆ … ┆ P071302010 ┆ 1 ┆ 770 ┆ 十条仲原店 │\n",
"│ 20191010 ┆ 1570665600 ┆ S14027 ┆ 1102 ┆ … ┆ P071101003 ┆ 1 ┆ 680 ┆ 南藤沢店 │\n",
"└───────────┴─────────────┴──────────┴────────────┴───┴────────────┴──────────┴────────┴────────────┘"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".join(\n",
" df_store.select('store_cd', 'store_name'), \n",
" on='store_cd'\n",
")\n",
".head(10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "4f60f07b-98c6-405a-b4a5-9292176535c3",
"metadata": {},
"source": [
"## P-037\n",
"\n",
"商品データ(df_product)とカテゴリデータ(df_category)を内部結合し、商品データの全項目とカテゴリデータのカテゴリ小区分名(category_small_name)を10件表示せよ。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "13c3f47c-05ec-4387-a2ec-e64bff66e44b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 7)product_cd | category_major_cd | category_medium_cd | category_small_cd | unit_price | unit_cost | category_small_name |
---|
str | str | str | str | i64 | i64 | str |
"P040101001" | "04" | "0401" | "040101" | 198 | 149 | "弁当類" |
"P040101002" | "04" | "0401" | "040101" | 218 | 164 | "弁当類" |
"P040101003" | "04" | "0401" | "040101" | 230 | 173 | "弁当類" |
"P040101004" | "04" | "0401" | "040101" | 248 | 186 | "弁当類" |
"P040101005" | "04" | "0401" | "040101" | 268 | 201 | "弁当類" |
"P040101006" | "04" | "0401" | "040101" | 298 | 224 | "弁当類" |
"P040101007" | "04" | "0401" | "040101" | 338 | 254 | "弁当類" |
"P040101008" | "04" | "0401" | "040101" | 420 | 315 | "弁当類" |
"P040101009" | "04" | "0401" | "040101" | 498 | 374 | "弁当類" |
"P040101010" | "04" | "0401" | "040101" | 580 | 435 | "弁当類" |
"
],
"text/plain": [
"shape: (10, 7)\n",
"┌────────────┬───────────────────┬────────────────────┬───────────────────┬────────────┬───────────┬───────────────────┐\n",
"│ product_cd ┆ category_major_cd ┆ category_medium_cd ┆ category_small_cd ┆ unit_price ┆ unit_cost ┆ category_small_na │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ me │\n",
"│ str ┆ str ┆ str ┆ str ┆ i64 ┆ i64 ┆ --- │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ str │\n",
"╞════════════╪═══════════════════╪════════════════════╪═══════════════════╪════════════╪═══════════╪═══════════════════╡\n",
"│ P040101001 ┆ 04 ┆ 0401 ┆ 040101 ┆ 198 ┆ 149 ┆ 弁当類 │\n",
"│ P040101002 ┆ 04 ┆ 0401 ┆ 040101 ┆ 218 ┆ 164 ┆ 弁当類 │\n",
"│ P040101003 ┆ 04 ┆ 0401 ┆ 040101 ┆ 230 ┆ 173 ┆ 弁当類 │\n",
"│ P040101004 ┆ 04 ┆ 0401 ┆ 040101 ┆ 248 ┆ 186 ┆ 弁当類 │\n",
"│ P040101005 ┆ 04 ┆ 0401 ┆ 040101 ┆ 268 ┆ 201 ┆ 弁当類 │\n",
"│ P040101006 ┆ 04 ┆ 0401 ┆ 040101 ┆ 298 ┆ 224 ┆ 弁当類 │\n",
"│ P040101007 ┆ 04 ┆ 0401 ┆ 040101 ┆ 338 ┆ 254 ┆ 弁当類 │\n",
"│ P040101008 ┆ 04 ┆ 0401 ┆ 040101 ┆ 420 ┆ 315 ┆ 弁当類 │\n",
"│ P040101009 ┆ 04 ┆ 0401 ┆ 040101 ┆ 498 ┆ 374 ┆ 弁当類 │\n",
"│ P040101010 ┆ 04 ┆ 0401 ┆ 040101 ┆ 580 ┆ 435 ┆ 弁当類 │\n",
"└────────────┴───────────────────┴────────────────────┴───────────────────┴────────────┴───────────┴───────────────────┘"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_product\n",
".join(\n",
" df_category.select(\n",
" 'category_small_cd', \n",
" 'category_small_name'\n",
" ),\n",
" on='category_small_cd'\n",
")\n",
".head(10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "1d71f22e-82dd-4535-9ec6-b46453862f5e",
"metadata": {},
"source": [
"## P-038\n",
"\n",
"顧客データ(df_customer)とレシート明細データ(df_receipt)から、顧客ごとの売上金額合計を求め、10件表示せよ。ただし、売上実績がない顧客については売上金額を0として表示させること。また、顧客は性別コード(gender_cd)が女性(1)であるものを対象とし、非会員(顧客IDが\"Z\"から始まるもの)は除外すること。"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "8804455b-973d-4148-b725-cd49c3609434",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 2)customer_id | sum_amount |
---|
str | i64 |
"CS021313000114" | 0 |
"CS031415000172" | 5088 |
"CS028811000001" | 0 |
"CS001215000145" | 875 |
"CS015414000103" | 3122 |
"CS033513000180" | 868 |
"CS035614000014" | 0 |
"CS011215000048" | 3444 |
"CS009413000079" | 0 |
"CS040412000191" | 210 |
"
],
"text/plain": [
"shape: (10, 2)\n",
"┌────────────────┬────────────┐\n",
"│ customer_id ┆ sum_amount │\n",
"│ --- ┆ --- │\n",
"│ str ┆ i64 │\n",
"╞════════════════╪════════════╡\n",
"│ CS021313000114 ┆ 0 │\n",
"│ CS031415000172 ┆ 5088 │\n",
"│ CS028811000001 ┆ 0 │\n",
"│ CS001215000145 ┆ 875 │\n",
"│ CS015414000103 ┆ 3122 │\n",
"│ CS033513000180 ┆ 868 │\n",
"│ CS035614000014 ┆ 0 │\n",
"│ CS011215000048 ┆ 3444 │\n",
"│ CS009413000079 ┆ 0 │\n",
"│ CS040412000191 ┆ 210 │\n",
"└────────────────┴────────────┘"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_customer\n",
".lazy()\n",
".filter(\n",
" (pl.col.gender_cd == \"1\") & \n",
" (pl.col.customer_id.str.starts_with('Z').not_())\n",
")\n",
".select('customer_id')\n",
".join(\n",
" df_receipt\n",
" .lazy()\n",
" .group_by('customer_id')\n",
" .agg(\n",
" sum_amount = pl.col.amount.sum()\n",
" ),\n",
" on='customer_id',\n",
" how='left'\n",
")\n",
".with_columns(\n",
" pl.col.sum_amount.fill_null(0)\n",
")\n",
".head(10)\n",
".collect()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "a9436c54-16ab-4bef-99f9-5bdbe397388e",
"metadata": {},
"source": [
"## P-039\n",
"\n",
"レシート明細データ(df_receipt)から、売上日数の多い顧客の上位20件を抽出したデータと、売上金額合計の多い顧客の上位20件を抽出したデータをそれぞれ作成し、さらにその2つを完全外部結合せよ。ただし、非会員(顧客IDが\"Z\"から始まるもの)は除外すること。"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "92f8e801-e9c1-4bcc-8e26-115e5cf03a92",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (34, 3)customer_id | come_days | buy_amount |
---|
str | u32 | i64 |
"CS017415000097" | 20 | 23086 |
"CS015415000185" | 22 | 20153 |
"CS031414000051" | 19 | 19202 |
"CS028415000007" | 21 | 19127 |
"CS001605000009" | null | 18925 |
… | … | … |
"CS014214000023" | 19 | null |
"CS022515000028" | 18 | null |
"CS010214000002" | 21 | null |
"CS039414000052" | 19 | null |
"CS030214000008" | 18 | null |
"
],
"text/plain": [
"shape: (34, 3)\n",
"┌────────────────┬───────────┬────────────┐\n",
"│ customer_id ┆ come_days ┆ buy_amount │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ u32 ┆ i64 │\n",
"╞════════════════╪═══════════╪════════════╡\n",
"│ CS017415000097 ┆ 20 ┆ 23086 │\n",
"│ CS015415000185 ┆ 22 ┆ 20153 │\n",
"│ CS031414000051 ┆ 19 ┆ 19202 │\n",
"│ CS028415000007 ┆ 21 ┆ 19127 │\n",
"│ CS001605000009 ┆ null ┆ 18925 │\n",
"│ … ┆ … ┆ … │\n",
"│ CS014214000023 ┆ 19 ┆ null │\n",
"│ CS022515000028 ┆ 18 ┆ null │\n",
"│ CS010214000002 ┆ 21 ┆ null │\n",
"│ CS039414000052 ┆ 19 ┆ null │\n",
"│ CS030214000008 ┆ 18 ┆ null │\n",
"└────────────────┴───────────┴────────────┘"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_data = df_receipt.filter(\n",
" pl.col.customer_id.str.starts_with('Z').not_()\n",
")\n",
"\n",
"group = df_data.group_by('customer_id')\n",
"\n",
"df_cnt = (\n",
" group\n",
" .agg(\n",
" come_days = pl.col.sales_ymd.n_unique()\n",
" )\n",
" .top_k(20, by='come_days')\n",
")\n",
"\n",
"df_sum = (\n",
" group\n",
" .agg(\n",
" buy_amount = pl.col.amount.sum()\n",
" )\n",
" .top_k(20, by='buy_amount')\n",
")\n",
"\n",
"df_cnt.join(df_sum, on='customer_id', how='full', coalesce=True)"
]
},
{
"cell_type": "markdown",
"id": "f12d65df-c76c-4060-b882-a0dfc3616c59",
"metadata": {},
"source": [
"## P-040\n",
"\n",
"全ての店舗と全ての商品を組み合わせたデータを作成したい。店舗データ(df_store)と商品データ(df_product)を直積し、件数を計算せよ。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a062bd63-fdde-49f8-a29a-abb89365115c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"531590"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_store\n",
".join(df_product, how='cross')\n",
".select(pl.len())\n",
".item()\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}