{
"cells": [
{
"cell_type": "markdown",
"id": "41fb9eda-8d3e-43e6-ac29-ed5bb545f0e3",
"metadata": {},
"source": [
"# データサイエンス100本ノック 51~60"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bfa0c462-fa3a-443e-b011-1094b8e3d629",
"metadata": {},
"outputs": [],
"source": [
"import polars as pl\n",
"from helper.polars import load_100knocks_data\n",
"pl.Config.set_fmt_str_lengths(100)\n",
"df_customer, df_category, df_product, df_receipt, df_store, df_geocode = load_100knocks_data()"
]
},
{
"cell_type": "markdown",
"id": "ee2d7ba9-bb69-402a-80b9-df567d83592c",
"metadata": {},
"source": [
"## P-051\n",
"\n",
"レシート明細データ(df_receipt)の売上エポック秒を日付型に変換し、「日」だけ取り出してレシート番号(receipt_no)、レシートサブ番号(receipt_sub_no)とともに10件表示せよ。なお、「日」は0埋め2桁で取り出すこと。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "24c9ca4c-247b-4494-b9ef-361e6660e07f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
shape: (5, 3)receipt_no | receipt_sub_no | sales_epoch |
---|
i64 | i64 | str |
112 | 1 | "03" |
1132 | 2 | "18" |
1102 | 1 | "12" |
1132 | 1 | "05" |
1102 | 2 | "21" |
"
],
"text/plain": [
"shape: (5, 3)\n",
"┌────────────┬────────────────┬─────────────┐\n",
"│ receipt_no ┆ receipt_sub_no ┆ sales_epoch │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ str │\n",
"╞════════════╪════════════════╪═════════════╡\n",
"│ 112 ┆ 1 ┆ 03 │\n",
"│ 1132 ┆ 2 ┆ 18 │\n",
"│ 1102 ┆ 1 ┆ 12 │\n",
"│ 1132 ┆ 1 ┆ 05 │\n",
"│ 1102 ┆ 2 ┆ 21 │\n",
"└────────────┴────────────────┴─────────────┘"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".select(\n",
" pl.col('receipt_no', 'receipt_sub_no'),\n",
" pl.from_epoch(pl.col.sales_epoch).dt.day().cast(str).str.pad_start(2, '0')\n",
")\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "950c80dd-45ce-4955-9392-ad2ede5f3cc1",
"metadata": {},
"source": [
"## P-052\n",
"\n",
"レシート明細データ(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計の上、売上金額合計に対して2,000円以下を0、2,000円より大きい金額を1に二値化し、顧客ID、売上金額合計とともに10件表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8503167b-508d-425f-abad-d030e9f54394",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 3)customer_id | sum_amount | sales_flg |
---|
str | i64 | i32 |
"CS006214000001" | 7364 | 1 |
"CS008415000097" | 1895 | 0 |
"CS028414000014" | 6222 | 1 |
"CS025415000050" | 5736 | 1 |
"CS003515000195" | 5412 | 1 |
"
],
"text/plain": [
"shape: (5, 3)\n",
"┌────────────────┬────────────┬───────────┐\n",
"│ customer_id ┆ sum_amount ┆ sales_flg │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ i64 ┆ i32 │\n",
"╞════════════════╪════════════╪═══════════╡\n",
"│ CS006214000001 ┆ 7364 ┆ 1 │\n",
"│ CS008415000097 ┆ 1895 ┆ 0 │\n",
"│ CS028414000014 ┆ 6222 ┆ 1 │\n",
"│ CS025415000050 ┆ 5736 ┆ 1 │\n",
"│ CS003515000195 ┆ 5412 ┆ 1 │\n",
"└────────────────┴────────────┴───────────┘"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".filter(\n",
" pl.col.customer_id.str.starts_with('Z').not_()\n",
")\n",
".group_by(\n",
" 'customer_id', maintain_order=True\n",
")\n",
".agg(\n",
" sum_amount=pl.col.amount.sum()\n",
")\n",
".with_columns(\n",
" sales_flg=pl.when(pl.col.sum_amount >= 2000).then(1).otherwise(0)\n",
")\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "11dff7e5-3e09-4bb5-844f-a2b807e49bc0",
"metadata": {},
"source": [
"## P-053\n",
"\n",
"顧客データ(df_customer)の郵便番号(postal_cd)に対し、東京(先頭3桁が100〜209のもの)を1、それ以外のものを0に二値化せよ。さらにレシート明細データ(df_receipt)と結合し、全期間において売上実績のある顧客数を、作成した二値ごとにカウントせよ。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bfddab04-13f3-4651-8602-751dd99ba3f3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (2, 2)postal_flg | customer_cnt |
---|
u8 | u32 |
0 | 3906 |
1 | 4400 |
"
],
"text/plain": [
"shape: (2, 2)\n",
"┌────────────┬──────────────┐\n",
"│ postal_flg ┆ customer_cnt │\n",
"│ --- ┆ --- │\n",
"│ u8 ┆ u32 │\n",
"╞════════════╪══════════════╡\n",
"│ 0 ┆ 3906 │\n",
"│ 1 ┆ 4400 │\n",
"└────────────┴──────────────┘"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".join(\n",
" df_customer\n",
" .select(\n",
" 'customer_id',\n",
" postal_flg=pl.col.postal_cd\n",
" .str.slice(0, 3)\n",
" .cast(pl.UInt16)\n",
" .is_between(100, 209)\n",
" .cast(pl.UInt8)\n",
" ),\n",
" on='customer_id',\n",
" how='inner'\n",
")\n",
".group_by(\n",
" 'postal_flg'\n",
")\n",
".agg(customer_cnt=pl.col.customer_id.n_unique())\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5d3afa15-0b21-49bf-bfcd-accfc960393b",
"metadata": {},
"source": [
"## P-054\n",
"\n",
"顧客データ(df_customer)の住所(address)は、埼玉県、千葉県、東京都、神奈川県のいずれかとなっている。都道府県毎にコード値を作成し、顧客ID、住所とともに10件表示せよ。値は埼玉県を11、千葉県を12、東京都を13、神奈川県を14とすること。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3115bce7-8c3d-4dc6-8cb9-7cb7a6095a04",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 3)customer_id | address | prefecture_cd |
---|
str | str | str |
"CS021313000114" | "神奈川県伊勢原市粟窪**********" | "14" |
"CS037613000071" | "東京都江東区南砂**********" | "13" |
"CS031415000172" | "東京都渋谷区代々木**********" | "13" |
"CS028811000001" | "神奈川県横浜市泉区和泉町**********" | "14" |
"CS001215000145" | "東京都大田区仲六郷**********" | "13" |
"
],
"text/plain": [
"shape: (5, 3)\n",
"┌────────────────┬────────────────────────────────────┬───────────────┐\n",
"│ customer_id ┆ address ┆ prefecture_cd │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ str ┆ str │\n",
"╞════════════════╪════════════════════════════════════╪═══════════════╡\n",
"│ CS021313000114 ┆ 神奈川県伊勢原市粟窪********** ┆ 14 │\n",
"│ CS037613000071 ┆ 東京都江東区南砂********** ┆ 13 │\n",
"│ CS031415000172 ┆ 東京都渋谷区代々木********** ┆ 13 │\n",
"│ CS028811000001 ┆ 神奈川県横浜市泉区和泉町********** ┆ 14 │\n",
"│ CS001215000145 ┆ 東京都大田区仲六郷********** ┆ 13 │\n",
"└────────────────┴────────────────────────────────────┴───────────────┘"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"names = ['埼玉県', '千葉県', '東京都', '神奈川県']\n",
"codes = [11, 12, 13, 14]\n",
"(\n",
"df_customer\n",
".select(\n",
" 'customer_id', 'address',\n",
" prefecture_cd=pl.col.address\n",
" .str.extract('^(.*?[都道府県])')\n",
" .replace(names, codes)\n",
")\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "6a0b61ab-024f-4fbe-b60c-39e5e0cbce0d",
"metadata": {},
"source": [
"## P-055\n",
"\n",
"レシート明細(df_receipt)データの売上金額(amount)を顧客ID(customer_id)ごとに合計し、その合計金額の四分位点を求めよ。その上で、顧客ごとの売上金額合計に対して以下の基準でカテゴリ値を作成し、顧客ID、売上金額合計とともに10件表示せよ。カテゴリ値は順に1〜4とする。\n",
"\n",
"- 最小値以上第1四分位未満 ・・・ 1を付与\n",
"- 第1四分位以上第2四分位未満 ・・・ 2を付与\n",
"- 第2四分位以上第3四分位未満 ・・・ 3を付与\n",
"- 第3四分位以上 ・・・ 4を付与"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "05eb3327-55ec-487f-96cc-023636a92d5d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 3)customer_id | sum_amount | pct_group |
---|
str | i64 | cat |
"CS006214000001" | 7364 | "4" |
"CS008415000097" | 1895 | "3" |
"CS028414000014" | 6222 | "4" |
"ZZ000000000000" | 12395003 | "4" |
"CS025415000050" | 5736 | "4" |
"CS003515000195" | 5412 | "4" |
"CS024514000042" | 533 | "1" |
"CS040415000178" | 6149 | "4" |
"CS027514000015" | 2788 | "3" |
"CS025415000134" | 4902 | "4" |
"
],
"text/plain": [
"shape: (10, 3)\n",
"┌────────────────┬────────────┬───────────┐\n",
"│ customer_id ┆ sum_amount ┆ pct_group │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ i64 ┆ cat │\n",
"╞════════════════╪════════════╪═══════════╡\n",
"│ CS006214000001 ┆ 7364 ┆ 4 │\n",
"│ CS008415000097 ┆ 1895 ┆ 3 │\n",
"│ CS028414000014 ┆ 6222 ┆ 4 │\n",
"│ ZZ000000000000 ┆ 12395003 ┆ 4 │\n",
"│ CS025415000050 ┆ 5736 ┆ 4 │\n",
"│ CS003515000195 ┆ 5412 ┆ 4 │\n",
"│ CS024514000042 ┆ 533 ┆ 1 │\n",
"│ CS040415000178 ┆ 6149 ┆ 4 │\n",
"│ CS027514000015 ┆ 2788 ┆ 3 │\n",
"│ CS025415000134 ┆ 4902 ┆ 4 │\n",
"└────────────────┴────────────┴───────────┘"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".group_by(\n",
" 'customer_id', maintain_order=True\n",
")\n",
".agg(\n",
" sum_amount=pl.col.amount.sum()\n",
")\n",
".with_columns(\n",
" pct_group=pl.col.sum_amount\n",
" .qcut([0.25, 0.5, 0.75], labels=['1', '2', '3', '4'])\n",
")\n",
".head(10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "26ad2e7b-e38b-47ec-9561-891b6167eda7",
"metadata": {},
"source": [
"## P-056\n",
"\n",
"顧客データ(df_customer)の年齢(age)をもとに10歳刻みで年代を算出し、顧客ID(customer_id)、生年月日(birth_day)とともに10件表示せよ。ただし、60歳以上は全て60歳代とすること。年代を表すカテゴリ名は任意とする。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "37fe27f0-f115-4a53-8074-fa2532c30b80",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 3)customer_id | birth_day | age |
---|
str | str | i64 |
"CS021313000114" | "1981-04-29" | 30 |
"CS037613000071" | "1952-04-01" | 60 |
"CS031415000172" | "1976-10-04" | 40 |
"CS028811000001" | "1933-03-27" | 60 |
"CS001215000145" | "1995-03-29" | 20 |
"CS020401000016" | "1974-09-15" | 40 |
"CS015414000103" | "1977-08-09" | 40 |
"CS029403000008" | "1973-08-17" | 40 |
"CS015804000004" | "1931-05-02" | 60 |
"CS033513000180" | "1962-07-11" | 50 |
"
],
"text/plain": [
"shape: (10, 3)\n",
"┌────────────────┬────────────┬─────┐\n",
"│ customer_id ┆ birth_day ┆ age │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ str ┆ i64 │\n",
"╞════════════════╪════════════╪═════╡\n",
"│ CS021313000114 ┆ 1981-04-29 ┆ 30 │\n",
"│ CS037613000071 ┆ 1952-04-01 ┆ 60 │\n",
"│ CS031415000172 ┆ 1976-10-04 ┆ 40 │\n",
"│ CS028811000001 ┆ 1933-03-27 ┆ 60 │\n",
"│ CS001215000145 ┆ 1995-03-29 ┆ 20 │\n",
"│ CS020401000016 ┆ 1974-09-15 ┆ 40 │\n",
"│ CS015414000103 ┆ 1977-08-09 ┆ 40 │\n",
"│ CS029403000008 ┆ 1973-08-17 ┆ 40 │\n",
"│ CS015804000004 ┆ 1931-05-02 ┆ 60 │\n",
"│ CS033513000180 ┆ 1962-07-11 ┆ 50 │\n",
"└────────────────┴────────────┴─────┘"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_customer\n",
".select(\n",
" 'customer_id', 'birth_day',\n",
" (pl.col.age // 10 * 10).clip(upper_bound=60)\n",
")\n",
".head(10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "70b98456-c596-45d2-9771-b77b4f112081",
"metadata": {},
"source": [
"## P-057\n",
"\n",
"056の抽出結果と性別コード(gender_cd)により、新たに性別×年代の組み合わせを表すカテゴリデータを作成し、10件表示せよ。組み合わせを表すカテゴリの値は任意とする。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ecdadaad-35f2-4b9e-b81f-8fefebcc2b9d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 3)customer_id | birth_day | gender_era |
---|
str | str | str |
"CS021313000114" | "1981-04-29" | "130" |
"CS037613000071" | "1952-04-01" | "960" |
"CS031415000172" | "1976-10-04" | "140" |
"CS028811000001" | "1933-03-27" | "160" |
"CS001215000145" | "1995-03-29" | "120" |
"CS020401000016" | "1974-09-15" | "040" |
"CS015414000103" | "1977-08-09" | "140" |
"CS029403000008" | "1973-08-17" | "040" |
"CS015804000004" | "1931-05-02" | "060" |
"CS033513000180" | "1962-07-11" | "150" |
"
],
"text/plain": [
"shape: (10, 3)\n",
"┌────────────────┬────────────┬────────────┐\n",
"│ customer_id ┆ birth_day ┆ gender_era │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ str ┆ str │\n",
"╞════════════════╪════════════╪════════════╡\n",
"│ CS021313000114 ┆ 1981-04-29 ┆ 130 │\n",
"│ CS037613000071 ┆ 1952-04-01 ┆ 960 │\n",
"│ CS031415000172 ┆ 1976-10-04 ┆ 140 │\n",
"│ CS028811000001 ┆ 1933-03-27 ┆ 160 │\n",
"│ CS001215000145 ┆ 1995-03-29 ┆ 120 │\n",
"│ CS020401000016 ┆ 1974-09-15 ┆ 040 │\n",
"│ CS015414000103 ┆ 1977-08-09 ┆ 140 │\n",
"│ CS029403000008 ┆ 1973-08-17 ┆ 040 │\n",
"│ CS015804000004 ┆ 1931-05-02 ┆ 060 │\n",
"│ CS033513000180 ┆ 1962-07-11 ┆ 150 │\n",
"└────────────────┴────────────┴────────────┘"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_customer\n",
".select(\n",
" 'customer_id', 'birth_day',\n",
" gender_era=\n",
" pl.col.gender_cd + \n",
" ((pl.col.age // 10 * 10).clip(upper_bound=60) + 100)\n",
" .cast(str)\n",
" .str.slice(1, 2)\n",
")\n",
".head(10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "4ba2256b-fe63-4e8d-ae5b-310ed8347f08",
"metadata": {},
"source": [
"## P-058\n",
"\n",
"顧客データ(df_customer)の性別コード(gender_cd)をダミー変数化し、顧客ID(customer_id)とともに10件表示せよ。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c743f5b1-cdbb-4c56-99e9-c38c117b136c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 4)customer_id | gender_cd_0 | gender_cd_1 | gender_cd_9 |
---|
str | u8 | u8 | u8 |
"CS021313000114" | 0 | 1 | 0 |
"CS037613000071" | 0 | 0 | 1 |
"CS031415000172" | 0 | 1 | 0 |
"CS028811000001" | 0 | 1 | 0 |
"CS001215000145" | 0 | 1 | 0 |
"
],
"text/plain": [
"shape: (5, 4)\n",
"┌────────────────┬─────────────┬─────────────┬─────────────┐\n",
"│ customer_id ┆ gender_cd_0 ┆ gender_cd_1 ┆ gender_cd_9 │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ u8 ┆ u8 ┆ u8 │\n",
"╞════════════════╪═════════════╪═════════════╪═════════════╡\n",
"│ CS021313000114 ┆ 0 ┆ 1 ┆ 0 │\n",
"│ CS037613000071 ┆ 0 ┆ 0 ┆ 1 │\n",
"│ CS031415000172 ┆ 0 ┆ 1 ┆ 0 │\n",
"│ CS028811000001 ┆ 0 ┆ 1 ┆ 0 │\n",
"│ CS001215000145 ┆ 0 ┆ 1 ┆ 0 │\n",
"└────────────────┴─────────────┴─────────────┴─────────────┘"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"pl.concat([\n",
" df_customer.select('customer_id'), \n",
" df_customer.get_column('gender_cd').to_dummies()\n",
"], how='horizontal')\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "42bd40fb-f510-4814-809e-1ed291fc347e",
"metadata": {},
"source": [
"## P-059\n",
"\n",
"レシート明細データ(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、売上金額合計を平均0、標準偏差1に標準化して顧客ID、売上金額合計とともに10件表示せよ。標準化に使用する標準偏差は、分散の平方根、もしくは不偏分散の平方根のどちらでも良いものとする。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。"
]
},
{
"cell_type": "markdown",
"id": "e12d3631-673b-44f9-a8d7-8456ba0616e3",
"metadata": {},
"source": [
"TIPS:\n",
"- query()の引数engineで'python'か'numexpr'かを選択でき、デフォルトはインストールされていればnumexprが、無ければpythonが使われます。さらに、文字列メソッドはengine='python'でないとquery()内で使えません。\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "a1a413da-3d50-4542-9326-86bf8dd2d590",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 3)customer_id | sum_amount | std_amount |
---|
str | i64 | f64 |
"CS037411000055" | 226 | -0.85337 |
"CS005415000376" | 229 | -0.852268 |
"CS022613000082" | 1576 | -0.35717 |
"CS041411000001" | 9694 | 2.62665 |
"CS021515000208" | 9076 | 2.3995 |
"CS040311000030" | 323 | -0.817717 |
"CS018512000134" | 1053 | -0.549401 |
"CS010513000101" | 523 | -0.744206 |
"CS023514000110" | 6057 | 1.289849 |
"CS026513000002" | 692 | -0.682089 |
"
],
"text/plain": [
"shape: (10, 3)\n",
"┌────────────────┬────────────┬────────────┐\n",
"│ customer_id ┆ sum_amount ┆ std_amount │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ i64 ┆ f64 │\n",
"╞════════════════╪════════════╪════════════╡\n",
"│ CS037411000055 ┆ 226 ┆ -0.85337 │\n",
"│ CS005415000376 ┆ 229 ┆ -0.852268 │\n",
"│ CS022613000082 ┆ 1576 ┆ -0.35717 │\n",
"│ CS041411000001 ┆ 9694 ┆ 2.62665 │\n",
"│ CS021515000208 ┆ 9076 ┆ 2.3995 │\n",
"│ CS040311000030 ┆ 323 ┆ -0.817717 │\n",
"│ CS018512000134 ┆ 1053 ┆ -0.549401 │\n",
"│ CS010513000101 ┆ 523 ┆ -0.744206 │\n",
"│ CS023514000110 ┆ 6057 ┆ 1.289849 │\n",
"│ CS026513000002 ┆ 692 ┆ -0.682089 │\n",
"└────────────────┴────────────┴────────────┘"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".filter(\n",
" pl.col.customer_id.str.starts_with('Z').not_() \n",
")\n",
".group_by('customer_id')\n",
".agg(\n",
" sum_amount=pl.col.amount.sum()\n",
")\n",
".with_columns(\n",
" std_amount=(pl.col.sum_amount - pl.col.sum_amount.mean()) / pl.col.sum_amount.std()\n",
")\n",
".head(10)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "a4785982-7a9c-4439-9d90-4630b27b8da6",
"metadata": {},
"source": [
"## P-060\n",
"\n",
"レシート明細データ(df_receipt)の売上金額(amount)を顧客ID(customer_id)ごとに合計し、売上金額合計を最小値0、最大値1に正規化して顧客ID、売上金額合計とともに10件表示せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4483248f-34bf-4530-94d3-c81a6015d9fd",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (10, 3)customer_id | sum_amount | std_amount |
---|
str | i64 | f64 |
"CS024515000230" | 3041 | 0.129084 |
"CS034414000039" | 12270 | 0.530066 |
"CS026512000011" | 446 | 0.016336 |
"CS019415000280" | 6109 | 0.262383 |
"CS026412000156" | 1290 | 0.053007 |
"CS025513000185" | 1238 | 0.050747 |
"CS010115000005" | 3656 | 0.155805 |
"CS005515000343" | 1248 | 0.051182 |
"CS021613000009" | 1164 | 0.047532 |
"CS006314000011" | 1208 | 0.049444 |
"
],
"text/plain": [
"shape: (10, 3)\n",
"┌────────────────┬────────────┬────────────┐\n",
"│ customer_id ┆ sum_amount ┆ std_amount │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ i64 ┆ f64 │\n",
"╞════════════════╪════════════╪════════════╡\n",
"│ CS024515000230 ┆ 3041 ┆ 0.129084 │\n",
"│ CS034414000039 ┆ 12270 ┆ 0.530066 │\n",
"│ CS026512000011 ┆ 446 ┆ 0.016336 │\n",
"│ CS019415000280 ┆ 6109 ┆ 0.262383 │\n",
"│ CS026412000156 ┆ 1290 ┆ 0.053007 │\n",
"│ CS025513000185 ┆ 1238 ┆ 0.050747 │\n",
"│ CS010115000005 ┆ 3656 ┆ 0.155805 │\n",
"│ CS005515000343 ┆ 1248 ┆ 0.051182 │\n",
"│ CS021613000009 ┆ 1164 ┆ 0.047532 │\n",
"│ CS006314000011 ┆ 1208 ┆ 0.049444 │\n",
"└────────────────┴────────────┴────────────┘"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".filter(\n",
" pl.col.customer_id.str.starts_with('Z').not_() \n",
")\n",
".group_by('customer_id')\n",
".agg(\n",
" sum_amount=pl.col.amount.sum()\n",
")\n",
".with_columns(\n",
" std_amount=(pl.col.sum_amount - pl.col.sum_amount.min()) / (pl.col.sum_amount.max() - pl.col.sum_amount.min())\n",
")\n",
".head(10)\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}