{
"cells": [
{
"cell_type": "markdown",
"id": "41fb9eda-8d3e-43e6-ac29-ed5bb545f0e3",
"metadata": {},
"source": [
"# データサイエンス100本ノック 71~80"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "bfa0c462-fa3a-443e-b011-1094b8e3d629",
"metadata": {},
"outputs": [],
"source": [
"import polars as pl\n",
"from helper.polars import load_100knocks_data\n",
"pl.Config.set_fmt_str_lengths(100)\n",
"df_customer, df_category, df_product, df_receipt, df_store, df_geocode = load_100knocks_data()"
]
},
{
"cell_type": "markdown",
"id": "8cecdb89-8586-4dea-bcfd-42c91326a035",
"metadata": {},
"source": [
"## P-071\n",
"\n",
"レシート明細データ(df_receipt)の売上日(sales_ymd)に対し、顧客データ(df_customer)の会員申込日(application_date)からの経過月数を計算し、顧客ID(customer_id)、売上日、会員申込日とともに10件表示せよ(sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。1ヶ月未満は切り捨てること。"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "7d16b70c-ce62-4d15-8d3c-e5ebb4c2a86c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
shape: (5, 4)customer_id | sales_ymd | application_date | elapsed_months |
---|
str | date | date | i32 |
"CS025115000002" | 2019-09-13 | 2016-01-16 | 44 |
"CS025304000004" | 2017-04-16 | 2016-01-04 | 15 |
"CS029415000098" | 2019-09-09 | 2015-07-13 | 50 |
"CS027414000125" | 2019-07-20 | 2015-10-15 | 45 |
"CS017315000003" | 2019-09-17 | 2015-07-02 | 50 |
"
],
"text/plain": [
"shape: (5, 4)\n",
"┌────────────────┬────────────┬──────────────────┬────────────────┐\n",
"│ customer_id ┆ sales_ymd ┆ application_date ┆ elapsed_months │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ date ┆ date ┆ i32 │\n",
"╞════════════════╪════════════╪══════════════════╪════════════════╡\n",
"│ CS025115000002 ┆ 2019-09-13 ┆ 2016-01-16 ┆ 44 │\n",
"│ CS025304000004 ┆ 2017-04-16 ┆ 2016-01-04 ┆ 15 │\n",
"│ CS029415000098 ┆ 2019-09-09 ┆ 2015-07-13 ┆ 50 │\n",
"│ CS027414000125 ┆ 2019-07-20 ┆ 2015-10-15 ┆ 45 │\n",
"│ CS017315000003 ┆ 2019-09-17 ┆ 2015-07-02 ┆ 50 │\n",
"└────────────────┴────────────┴──────────────────┴────────────────┘"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".select('customer_id', 'sales_ymd')\n",
".unique(keep='first')\n",
".join(df_customer, on='customer_id')\n",
".select(\n",
" pl.col.customer_id,\n",
" pl.col.sales_ymd.cast(str).str.strptime(pl.Date, '%Y%m%d'),\n",
" pl.col.application_date.cast(str).str.strptime(pl.Date, '%Y%m%d')\n",
")\n",
".with_columns(\n",
" elapsed_months=\n",
" (pl.col.sales_ymd.dt.year() - pl.col.application_date.dt.year()) * 12 +\n",
" (pl.col.sales_ymd.dt.month() - pl.col.application_date.dt.month())\n",
")\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f8b1ebad-69dc-46b1-bd9f-ad39def83f50",
"metadata": {},
"source": [
"## P-072\n",
"\n",
"レシート明細データ(df_receipt)の売上日(df_customer)に対し、顧客データ(df_customer)の会員申込日(application_date)からの経過年数を計算し、顧客ID(customer_id)、売上日、会員申込日とともに10件表示せよ(sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。1年未満は切り捨てること。"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "80a8f0b4-1ec2-41a9-a88d-8ac064e22ae0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 4)customer_id | sales_ymd | application_date | elapsed_years |
---|
str | date | date | i32 |
"CS038315000124" | 2017-09-17 | 2015-04-25 | 2 |
"CS022515000128" | 2018-02-08 | 2016-01-16 | 2 |
"CS045615000002" | 2019-08-08 | 2017-05-10 | 2 |
"CS016414000063" | 2019-06-17 | 2014-11-20 | 5 |
"CS008414000023" | 2019-06-12 | 2015-05-21 | 4 |
"
],
"text/plain": [
"shape: (5, 4)\n",
"┌────────────────┬────────────┬──────────────────┬───────────────┐\n",
"│ customer_id ┆ sales_ymd ┆ application_date ┆ elapsed_years │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ date ┆ date ┆ i32 │\n",
"╞════════════════╪════════════╪══════════════════╪═══════════════╡\n",
"│ CS038315000124 ┆ 2017-09-17 ┆ 2015-04-25 ┆ 2 │\n",
"│ CS022515000128 ┆ 2018-02-08 ┆ 2016-01-16 ┆ 2 │\n",
"│ CS045615000002 ┆ 2019-08-08 ┆ 2017-05-10 ┆ 2 │\n",
"│ CS016414000063 ┆ 2019-06-17 ┆ 2014-11-20 ┆ 5 │\n",
"│ CS008414000023 ┆ 2019-06-12 ┆ 2015-05-21 ┆ 4 │\n",
"└────────────────┴────────────┴──────────────────┴───────────────┘"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".select('customer_id', 'sales_ymd')\n",
".unique(keep='first')\n",
".join(df_customer, on='customer_id')\n",
".select(\n",
" pl.col.customer_id,\n",
" pl.col.sales_ymd.cast(str).str.strptime(pl.Date, '%Y%m%d'),\n",
" pl.col.application_date.cast(str).str.strptime(pl.Date, '%Y%m%d')\n",
")\n",
".with_columns(\n",
" elapsed_years=pl.col.sales_ymd.dt.year() - pl.col.application_date.dt.year()\n",
")\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "6137f06c-8a64-4bd1-bb5c-c90a5e89f207",
"metadata": {},
"source": [
"## P-073\n",
"\n",
"レシート明細データ(df_receipt)の売上日(sales_ymd)に対し、顧客データ(df_customer)の会員申込日(application_date)からのエポック秒による経過時間を計算し、顧客ID(customer_id)、売上日、会員申込日とともに10件表示せよ(なお、sales_ymdは数値、application_dateは文字列でデータを保持している点に注意)。なお、時間情報は保有していないため各日付は0時0分0秒を表すものとする。"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "dcde0667-2fa7-413c-a6f4-61757fe071ad",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 4)customer_id | sales_ymd | application_date | elapsed_years |
---|
str | date | date | i64 |
"CS020414000028" | 2017-03-02 | 2015-04-04 | 60307200 |
"CS039215000046" | 2017-03-23 | 2015-08-20 | 50198400 |
"CS035414000064" | 2017-08-10 | 2015-03-21 | 75427200 |
"CS019414000012" | 2018-02-01 | 2015-03-22 | 90460800 |
"CS035414000071" | 2017-01-29 | 2014-10-19 | 71971200 |
"
],
"text/plain": [
"shape: (5, 4)\n",
"┌────────────────┬────────────┬──────────────────┬───────────────┐\n",
"│ customer_id ┆ sales_ymd ┆ application_date ┆ elapsed_years │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ date ┆ date ┆ i64 │\n",
"╞════════════════╪════════════╪══════════════════╪═══════════════╡\n",
"│ CS020414000028 ┆ 2017-03-02 ┆ 2015-04-04 ┆ 60307200 │\n",
"│ CS039215000046 ┆ 2017-03-23 ┆ 2015-08-20 ┆ 50198400 │\n",
"│ CS035414000064 ┆ 2017-08-10 ┆ 2015-03-21 ┆ 75427200 │\n",
"│ CS019414000012 ┆ 2018-02-01 ┆ 2015-03-22 ┆ 90460800 │\n",
"│ CS035414000071 ┆ 2017-01-29 ┆ 2014-10-19 ┆ 71971200 │\n",
"└────────────────┴────────────┴──────────────────┴───────────────┘"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".select('customer_id', 'sales_ymd')\n",
".unique(keep='first')\n",
".join(df_customer, on='customer_id')\n",
".select(\n",
" pl.col.customer_id,\n",
" pl.col.sales_ymd.cast(str).str.strptime(pl.Date, '%Y%m%d'),\n",
" pl.col.application_date.cast(str).str.strptime(pl.Date, '%Y%m%d')\n",
")\n",
".with_columns(\n",
" elapsed_years=(pl.col.sales_ymd - pl.col.application_date).dt.total_seconds()\n",
")\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7eae19e4-67ca-467c-b71d-7d067fd4e7fd",
"metadata": {},
"source": [
"## P-074\n",
"\n",
"レシート明細データ(df_receipt)の売上日(sales_ymd)に対し、当該週の月曜日からの経過日数を計算し、売上日、直前の月曜日付とともに10件表示せよ(sales_ymdは数値でデータを保持している点に注意)。"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "ae5907c7-0d88-46ed-acd9-b877fa541604",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 3)sales_ymd | monday | elapsed_days |
---|
date | date | i64 |
2018-11-03 | 2018-10-29 | 5 |
2018-11-18 | 2018-11-12 | 6 |
2017-07-12 | 2017-07-10 | 2 |
2018-08-21 | 2018-08-20 | 1 |
2019-06-05 | 2019-06-03 | 2 |
"
],
"text/plain": [
"shape: (5, 3)\n",
"┌────────────┬────────────┬──────────────┐\n",
"│ sales_ymd ┆ monday ┆ elapsed_days │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ date ┆ date ┆ i64 │\n",
"╞════════════╪════════════╪══════════════╡\n",
"│ 2018-11-03 ┆ 2018-10-29 ┆ 5 │\n",
"│ 2018-11-18 ┆ 2018-11-12 ┆ 6 │\n",
"│ 2017-07-12 ┆ 2017-07-10 ┆ 2 │\n",
"│ 2018-08-21 ┆ 2018-08-20 ┆ 1 │\n",
"│ 2019-06-05 ┆ 2019-06-03 ┆ 2 │\n",
"└────────────┴────────────┴──────────────┘"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".select('customer_id', 'sales_ymd')\n",
".join(df_customer, on='customer_id')\n",
".select(\n",
" pl.col.sales_ymd.cast(str).str.strptime(pl.Date, '%Y%m%d'),\n",
")\n",
".with_columns(\n",
" pl.col.sales_ymd.dt.truncate('1w').alias('monday')\n",
")\n",
".with_columns(\n",
" (pl.col.sales_ymd - pl.col.monday).alias('elapsed_days').dt.total_days()\n",
")\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "48d8869d-cfd6-4aff-b7e0-ab1a65d02ab8",
"metadata": {},
"source": [
"## P-075\n",
"\n",
"顧客データ(df_customer)からランダムに1%のデータを抽出し、先頭から10件表示せよ。"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "3d878e99-3443-4e15-a03c-b8de137acf43",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 11)customer_id | customer_name | gender_cd | gender | birth_day | age | postal_cd | address | application_store_cd | application_date | status_cd |
---|
str | str | str | str | str | i64 | str | str | str | i64 | str |
"CS017713000064" | "西谷 りえ" | "1" | "女性" | "1947-12-15" | 71 | "165-0027" | "東京都中野区野方**********" | "S13017" | 20150306 | "0-00000000-0" |
"CS026415000057" | "宮崎 あさみ" | "9" | "不明" | "1974-12-29" | 44 | "253-0013" | "神奈川県茅ヶ崎市赤松町**********" | "S14026" | 20150401 | "E-20100916-D" |
"CS027211000005" | "吉井 華子" | "1" | "女性" | "1995-11-20" | 23 | "245-0067" | "神奈川県横浜市戸塚区深谷町**********" | "S14027" | 20141101 | "6-20100831-9" |
"CS030514000050" | "真田 真帆" | "1" | "女性" | "1965-06-19" | 53 | "272-0031" | "千葉県市川市平田**********" | "S12030" | 20150703 | "B-20100729-A" |
"CS011705000002" | "島津 哲平" | "0" | "男性" | "1940-03-05" | 79 | "223-0065" | "神奈川県横浜市港北区高田東**********" | "S14011" | 20150812 | "0-00000000-0" |
"
],
"text/plain": [
"shape: (5, 11)\n",
"┌───────────────┬───────────────┬───────────┬────────┬───┬───────────────┬───────────────┬──────────────┬──────────────┐\n",
"│ customer_id ┆ customer_name ┆ gender_cd ┆ gender ┆ … ┆ address ┆ application_s ┆ application_ ┆ status_cd │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ tore_cd ┆ date ┆ --- │\n",
"│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ --- ┆ --- ┆ str │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ str ┆ i64 ┆ │\n",
"╞═══════════════╪═══════════════╪═══════════╪════════╪═══╪═══════════════╪═══════════════╪══════════════╪══════════════╡\n",
"│ CS01771300006 ┆ 西谷 りえ ┆ 1 ┆ 女性 ┆ … ┆ 東京都中野区 ┆ S13017 ┆ 20150306 ┆ 0-00000000-0 │\n",
"│ 4 ┆ ┆ ┆ ┆ ┆ 野方********* ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ * ┆ ┆ ┆ │\n",
"│ CS02641500005 ┆ 宮崎 あさみ ┆ 9 ┆ 不明 ┆ … ┆ 神奈川県茅ヶ ┆ S14026 ┆ 20150401 ┆ E-20100916-D │\n",
"│ 7 ┆ ┆ ┆ ┆ ┆ 崎市赤松町*** ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ******* ┆ ┆ ┆ │\n",
"│ CS02721100000 ┆ 吉井 華子 ┆ 1 ┆ 女性 ┆ … ┆ 神奈川県横浜 ┆ S14027 ┆ 20141101 ┆ 6-20100831-9 │\n",
"│ 5 ┆ ┆ ┆ ┆ ┆ 市戸塚区深谷 ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ 町********** ┆ ┆ ┆ │\n",
"│ CS03051400005 ┆ 真田 真帆 ┆ 1 ┆ 女性 ┆ … ┆ 千葉県市川市 ┆ S12030 ┆ 20150703 ┆ B-20100729-A │\n",
"│ 0 ┆ ┆ ┆ ┆ ┆ 平田********* ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ * ┆ ┆ ┆ │\n",
"│ CS01170500000 ┆ 島津 哲平 ┆ 0 ┆ 男性 ┆ … ┆ 神奈川県横浜 ┆ S14011 ┆ 20150812 ┆ 0-00000000-0 │\n",
"│ 2 ┆ ┆ ┆ ┆ ┆ 市港北区高田 ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ 東********** ┆ ┆ ┆ │\n",
"└───────────────┴───────────────┴───────────┴────────┴───┴───────────────┴───────────────┴──────────────┴──────────────┘"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_customer.sample(fraction=0.01).head(5)"
]
},
{
"cell_type": "markdown",
"id": "435025a6-9a46-448c-a109-2c5212d3dc49",
"metadata": {},
"source": [
"## P-076\n",
"\n",
"顧客データ(df_customer)から性別コード(gender_cd)の割合に基づきランダムに10%のデータを層化抽出し、性別コードごとに件数を集計せよ。"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "503f53a6-da3b-4d4c-9f39-a28dba0df94f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (3, 2)gender_cd | len |
---|
str | u32 |
"9" | 108 |
"1" | 1792 |
"0" | 299 |
"
],
"text/plain": [
"shape: (3, 2)\n",
"┌───────────┬──────┐\n",
"│ gender_cd ┆ len │\n",
"│ --- ┆ --- │\n",
"│ str ┆ u32 │\n",
"╞═══════════╪══════╡\n",
"│ 9 ┆ 108 │\n",
"│ 1 ┆ 1792 │\n",
"│ 0 ┆ 299 │\n",
"└───────────┴──────┘"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_customer\n",
".filter(\n",
" (pl.int_range(pl.len()) < pl.len() * 0.1).shuffle().over('gender_cd') == 1\n",
")\n",
".group_by('gender_cd')\n",
".agg(pl.len())\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b8cd72f9-78f1-4292-bd7c-f0f1c0ff5a91",
"metadata": {},
"source": [
"## P-077\n",
"\n",
"レシート明細データ(df_receipt)の売上金額を顧客単位に合計し、合計した売上金額の外れ値を抽出せよ。なお、外れ値は売上金額合計を対数化したうえで平均と標準偏差を計算し、その平均から3σを超えて離れたものとする(自然対数と常用対数のどちらでも可)。結果は10件表示せよ。"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "d236798c-4706-41e4-aa4e-0671065b5f00",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (1, 3)customer_id | sum_amount | log_sum_amount |
---|
str | i64 | f64 |
"ZZ000000000000" | 12395003 | 16.332804 |
"
],
"text/plain": [
"shape: (1, 3)\n",
"┌────────────────┬────────────┬────────────────┐\n",
"│ customer_id ┆ sum_amount ┆ log_sum_amount │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ i64 ┆ f64 │\n",
"╞════════════════╪════════════╪════════════════╡\n",
"│ ZZ000000000000 ┆ 12395003 ┆ 16.332804 │\n",
"└────────────────┴────────────┴────────────────┘"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
"df_receipt\n",
".group_by('customer_id')\n",
".agg(\n",
" sum_amount=pl.col.amount.sum(),\n",
" log_sum_amount=pl.col.amount.sum().log(),\n",
")\n",
".filter(\n",
" (pl.col.log_sum_amount - pl.col.log_sum_amount.mean()) / pl.col.log_sum_amount.std() > 3\n",
")\n",
")"
]
},
{
"cell_type": "markdown",
"id": "94cc7900-216a-4272-9b5f-095d1d9b8e35",
"metadata": {},
"source": [
"## P-078\n",
"\n",
"レシート明細データ(df_receipt)の売上金額(amount)を顧客単位に合計し、合計した売上金額の外れ値を抽出せよ。ただし、顧客IDが\"Z\"から始まるのものは非会員を表すため、除外して計算すること。なお、ここでは外れ値を第1四分位と第3四分位の差であるIQRを用いて、「第1四分位数-1.5×IQR」を下回るもの、または「第3四分位数+1.5×IQR」を超えるものとする。結果は10件表示せよ。"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "622ae623-f1c8-43b9-bc8a-fcc6fd09ab8e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 2)customer_id | sum_amount |
---|
str | i64 |
"CS026515000201" | 11144 |
"CS007615000046" | 8979 |
"CS021515000101" | 10401 |
"CS022515000119" | 8354 |
"CS025414000027" | 8603 |
"
],
"text/plain": [
"shape: (5, 2)\n",
"┌────────────────┬────────────┐\n",
"│ customer_id ┆ sum_amount │\n",
"│ --- ┆ --- │\n",
"│ str ┆ i64 │\n",
"╞════════════════╪════════════╡\n",
"│ CS026515000201 ┆ 11144 │\n",
"│ CS007615000046 ┆ 8979 │\n",
"│ CS021515000101 ┆ 10401 │\n",
"│ CS022515000119 ┆ 8354 │\n",
"│ CS025414000027 ┆ 8603 │\n",
"└────────────────┴────────────┘"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"col = pl.col.sum_amount\n",
"q25 = col.quantile(0.25)\n",
"q75 = col.quantile(0.75)\n",
"iqr = q75 - q25\n",
"expr = (col > q75 + iqr * 1.5) | (col < q25 - iqr * 1.5)\n",
"(\n",
"df_receipt\n",
".filter(\n",
" pl.col.customer_id.str.starts_with('Z').not_()\n",
")\n",
".group_by('customer_id')\n",
".agg(sum_amount=pl.col.amount.sum())\n",
".filter(expr)\n",
".head()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "15fdb979-2b21-4229-ac27-7dcb4e6fa6df",
"metadata": {},
"source": [
"## P-079\n",
"\n",
"商品データ(df_product)の各項目に対し、欠損数を確認せよ。"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "66e4cad8-7103-4f3b-ad4e-14684168602d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (1, 6)product_cd | category_major_cd | category_medium_cd | category_small_cd | unit_price | unit_cost |
---|
u32 | u32 | u32 | u32 | u32 | u32 |
0 | 0 | 0 | 0 | 7 | 7 |
"
],
"text/plain": [
"shape: (1, 6)\n",
"┌────────────┬───────────────────┬────────────────────┬───────────────────┬────────────┬───────────┐\n",
"│ product_cd ┆ category_major_cd ┆ category_medium_cd ┆ category_small_cd ┆ unit_price ┆ unit_cost │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 ┆ u32 │\n",
"╞════════════╪═══════════════════╪════════════════════╪═══════════════════╪════════════╪═══════════╡\n",
"│ 0 ┆ 0 ┆ 0 ┆ 0 ┆ 7 ┆ 7 │\n",
"└────────────┴───────────────────┴────────────────────┴───────────────────┴────────────┴───────────┘"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_product.select(\n",
" pl.all().null_count()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "3c54d994-0010-42c9-9d07-497734616b65",
"metadata": {},
"source": [
"## P-080\n",
"\n",
"商品データ(df_product)のいずれかの項目に欠損が発生しているレコードを全て削除した新たな商品データを作成せよ。なお、削除前後の件数を表示させ、079で確認した件数だけ減少していることも確認すること。"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "c8e8a68f-839d-4f6a-95a3-1f157bab682b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(10030, 10023)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df_product), len(df_product.drop_nulls())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}