{
"cells": [
{
"cell_type": "markdown",
"id": "85070315-e811-42e8-a0a0-c8b3c406dab6",
"metadata": {},
"source": [
"# データフレーム"
]
},
{
"cell_type": "markdown",
"id": "c81bd7ed-68a5-49ac-96da-734973786932",
"metadata": {},
"source": [
"データフレーム(DataFrame)はPolarsの中心的なデータ構造で、行と列からなる2次元のデータを表現します。各列は同じデータ型を持ち、列ごとに型が異なることが可能です。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c7385efc-9537-4d82-8e7d-713f5d18ceb8",
"metadata": {},
"outputs": [],
"source": [
"import polars as pl\n",
"from helper.jupyter import row"
]
},
{
"cell_type": "markdown",
"id": "ee1d168a-4f1d-4059-893c-4cbd4a9c298a",
"metadata": {},
"source": [
"## データフレームの作成\n",
"\n",
"本節では、Pythonの他のオブジェクトからデータフレームオブジェクトを作成する方法について説明します。"
]
},
{
"cell_type": "markdown",
"id": "ecce10ba-a945-4330-ba37-dec30370210f",
"metadata": {},
"source": [
"### リストと辞書の組み合わせ"
]
},
{
"cell_type": "markdown",
"id": "bf1572ab-1637-4453-b39f-e50600550191",
"metadata": {},
"source": [
"次のプログラムは、異なるPythonのデータ構造(辞書のリスト、リストの辞書、リストのリスト)を使用して`DataFrame`を作成します。\n",
"\n",
"1. 辞書のリスト(`list[dict]`): 辞書のキーが列名となり、リスト内の各辞書が1行分のデータとなります。\n",
"2. リストの辞書(`dict[list]`): 辞書のキーが列名となり、各リストの要素がその列に対応するデータになります。\n",
"3. リストのリスト(`list[list]`): `schema`引数で列名を指定します。\n",
" * `orient`引数は`'row'`の場合は、データの方向が行単位であることを指定して、内部の一つリストが1行分のデータとなります。\n",
" * `orient`引数は`'col'`の場合は、データの方向が列単位であることを指定して、内部の一つリストが1列分のデータとなります。この例では内部リストのデータ型一致しないので、`strict=False`で自動型変換を有効にします。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1f3ce2fa-510e-4be0-a758-43d841ea8526",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" shape: (3, 2)name | age |
---|
str | i64 | "Alice" | 30 | "Bob" | 25 | "Charlie" | 35 |
| \n",
" shape: (3, 2)name | age |
---|
str | i64 | "Alice" | 30 | "Bob" | 25 | "Charlie" | 35 |
| \n",
" shape: (3, 2)name | age |
---|
str | i64 | "Alice" | 30 | "Bob" | 25 | "Charlie" | 35 |
| \n",
" shape: (2, 3)p1 | p2 | p3 |
---|
str | str | str | "Alice" | "Bob" | "Charlie" | "30" | "25" | "35" |
|
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# list[dict]\n",
"dict_in_list = [\n",
" {\"name\": \"Alice\", \"age\": 30},\n",
" {\"name\": \"Bob\", \"age\": 25},\n",
" {\"name\": \"Charlie\", \"age\": 35}\n",
"]\n",
"df1 = pl.DataFrame(dict_in_list)\n",
"\n",
"# dict[list]\n",
"list_in_dict = {\n",
" \"name\": [\"Alice\", \"Bob\", \"Charlie\"],\n",
" \"age\": [30, 25, 35],\n",
"}\n",
"df2 = pl.DataFrame(list_in_dict)\n",
"\n",
"# list[list]\n",
"list_in_list = [\n",
" [\"Alice\", 30],\n",
" [\"Bob\", 25],\n",
" [\"Charlie\", 35]\n",
"]\n",
"columns = [\"name\", \"age\"] # カラム名を指定\n",
"df3 = pl.DataFrame(list_in_list, schema=columns, orient='row')\n",
"df4 = pl.DataFrame(list_in_list, schema=['p1', 'p2', 'p3'], orient='col', strict=False)\n",
"row(df1, df2, df3, df4)"
]
},
{
"cell_type": "markdown",
"id": "84ad5590-8224-4d0e-b55b-77d302fc6c81",
"metadata": {},
"source": [
"次の`data` は辞書形式で、次のようなデータを持っています:\n",
"\n",
" - `\"point\"` キーの値は辞書のリスト (`list[dict]`) で、各辞書には`x` と `y` の2つのキーが含まれています。\n",
" - `\"weight\"` キーの値は整数のリスト (`list[int]`) です。\n",
"\n",
"データフレームに変換するとき、外側の辞書のキーは列名になり、`point`列の要素は`Struct`型(構造体)に変換されます。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9f14e575-4397-4c4b-859f-db6c3970138b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (3, 2)point | weight |
---|
struct[2] | i64 |
{1,2} | 5 |
{3,4} | 4 |
{5,6} | 8 |
"
],
"text/plain": [
"shape: (3, 2)\n",
"┌───────────┬────────┐\n",
"│ point ┆ weight │\n",
"│ --- ┆ --- │\n",
"│ struct[2] ┆ i64 │\n",
"╞═══════════╪════════╡\n",
"│ {1,2} ┆ 5 │\n",
"│ {3,4} ┆ 4 │\n",
"│ {5,6} ┆ 8 │\n",
"└───────────┴────────┘"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = {\n",
" \"point\": [{\"x\": 1, \"y\": 2}, {\"x\": 3, \"y\": 4}, {\"x\": 5, \"y\": 6}],\n",
" \"weight\": [5, 4, 8],\n",
"}\n",
"pl.DataFrame(data)"
]
},
{
"cell_type": "markdown",
"id": "68739442-76a4-49cf-84de-3e666235504c",
"metadata": {},
"source": [
"### NumPyの配列"
]
},
{
"cell_type": "markdown",
"id": "89dcd835-c290-4587-a174-7543ba5c9b32",
"metadata": {},
"source": [
"NumPyの配列を扱う際、以下のようにlistとNumPy配列と互換性を持ちます。\n",
"\n",
"* `dict[list]`と`dict[1次元配列]`は同じ扱い\n",
"* `list[list]`と2次元配列は同じ扱い"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c761d1cb-c5d3-4669-ad59-da095ce5e16f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import numpy as np\n",
"array_in_dict = {\n",
" \"x\": np.array([1, 3, 5]),\n",
" \"y\": np.array([2, 4, 6]),\n",
"}\n",
"\n",
"df1 = pl.DataFrame(array_in_dict)\n",
"\n",
"array_2d = np.array([[1, 2], [3, 4], [5, 6]])\n",
"df2 = pl.DataFrame(array_2d, schema=['x', 'y'], orient='row')\n",
"df3 = pl.DataFrame(array_2d, schema=['p1', 'p2', 'p3'], orient='col')\n",
"row(df1, df2, df3)"
]
},
{
"cell_type": "markdown",
"id": "f21f1c4f-4539-4970-91f2-af6ca3cb244c",
"metadata": {},
"source": [
"1次元の構造化配列をデータフレームに変換する場合は、配列の各フィールドはデータフレームの各列になります。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "795cbc29-23a2-4dc2-9001-12d34cd35c67",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
"shape: (3, 2)\n",
"┌─────┬─────┐\n",
"│ x ┆ y │\n",
"│ --- ┆ --- │\n",
"│ i16 ┆ i16 │\n",
"╞═════╪═════╡\n",
"│ 1 ┆ 30 │\n",
"│ 2 ┆ 25 │\n",
"│ 3 ┆ 35 │\n",
"└─────┴─────┘"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr = np.array([\n",
" (1, 30),\n",
" (2, 25),\n",
" (3, 35)], dtype=[('x', 'i2'), ('y', 'i2')])\n",
"\n",
"pl.DataFrame(arr)"
]
},
{
"cell_type": "markdown",
"id": "c3e7beb0-be74-4f65-aa78-4f9c0c0b5a69",
"metadata": {},
"source": [
"### Seriesを含むデータ"
]
},
{
"cell_type": "markdown",
"id": "76949b0a-54b6-4134-9828-063d1ad1993d",
"metadata": {},
"source": [
"`pl.Series` を扱う場合、`list[Series]` や `dict[Series]` の形式をデータフレームに変換することがよくあります。どちらの場合も、それぞれの `Series` はデータフレームの列になりますが、列名の扱いが異なります。\n",
"\n",
"- `list[Series]`: 列名は `Series` の名前がそのまま使われます。\n",
"- `dict[Series]`: 列名は辞書のキーが使われます。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a3fe401b-d85c-4059-829f-6142f97d746e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sx = pl.Series('x', [1, 2, 3])\n",
"sy = pl.Series('y', [4, 5, 6])\n",
"\n",
"df1 = pl.DataFrame([sx, sy])\n",
"df2 = pl.DataFrame({'A':sx, 'B':sy})\n",
"row(df1, df2)"
]
},
{
"cell_type": "markdown",
"id": "a757d2c6-a311-4162-89ea-99b43e1e7ac1",
"metadata": {},
"source": [
"### pl.from_*()関数"
]
},
{
"cell_type": "markdown",
"id": "add40361-d2d0-449c-9e35-bb982e9d5258",
"metadata": {},
"source": [
"`from_` で始まる関数は、さまざまなデータ型をデータフレームに変換するために使用されます。これらの関数を利用すると、意図しないデータ変換が発生しにくく、コードのロバスト性を向上させることができます。\n",
"\n",
"- `pl.from_dict()`: `dict[list]` のデータから変換\n",
"- `pl.from_dicts()`: `list[dict]` のデータから変換\n",
"- `pl.from_numpy()`: NumPy の配列から変換\n",
"- `pl.from_records()`: `list[list]` のデータから変換\n",
"- `pl.from_pandas()`: Pandasの`DataFrame`オブジェクトから変換\n",
"- `pl.from_arrow()`: pyarrowの`Array`或いは`Table`オブジェクトから変換"
]
},
{
"cell_type": "markdown",
"id": "bc4968b1-6e92-44ae-94f4-a5073c90e4bb",
"metadata": {},
"source": [
"## データフレームの属性"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e3fca09d-55b2-48ff-8bdb-7574cbdc9b7e",
"metadata": {},
"outputs": [],
"source": [
"df = pl.DataFrame(\n",
" {\n",
" \"a\": [3, 3, 3, 4],\n",
" \"b\": [4.0, 12, 6, 7],\n",
" \"g\": ['A', 'B', 'A', 'B']\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "71c1bb23-cb89-4a3d-8181-8b674d09b574",
"metadata": {},
"source": [
"`shape`属性でデータフレームの形状(高さ、幅)を取得できます。又`height`と`width`属性で高さと幅を取得することもできます。`len()`関数でも高さを取得できます。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "bab3d3ae-fe6a-4d2d-9d29-ed522cf29614",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"df.shape = (4, 3)\n",
"df.height = 4\n",
"df.width = 3\n",
"len(df) = 4\n"
]
}
],
"source": [
"print(f\"{df.shape = }\")\n",
"print(f\"{df.height = }\")\n",
"print(f\"{df.width = }\")\n",
"print(f\"{len(df) = }\")"
]
},
{
"cell_type": "markdown",
"id": "70e1a124-87d0-46fc-9e16-b9a9b49e410f",
"metadata": {},
"source": [
"`columns`属性で列名のリストを取得できます。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b0351aa8-a032-442d-9751-14ba1f4f879d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['a', 'b', 'g']"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"id": "8f0e1417-b9b0-4d2d-94c2-ba921ff471ee",
"metadata": {},
"source": [
"`schema`属性で列名と列のデータ型を保存する辞書オブジェクトを取得できます。"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "c99da371-503f-4bf4-be58-def65766d4fd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Schema([('a', Int64), ('b', Float64), ('g', String)])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.schema"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "05840e3d-318b-44b8-82fd-528ad7e2d28f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"String"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.schema['g']"
]
},
{
"cell_type": "markdown",
"id": "54e739c7-ff40-42e1-ad4c-4079244db783",
"metadata": {},
"source": [
"`dtypes`属性で、各個列のデータ型を保存するリストを取得できます。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d5c33b09-4580-41c9-963f-e34ff051e804",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Int64, Float64, String]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"id": "2d66602f-2c12-4b35-8b33-002fa668a37e",
"metadata": {},
"source": [
"`flags`属性で各個列のソート状態を取得できます。"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "0a654663-e837-4adc-a834-ce760f52f50e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': {'SORTED_ASC': False, 'SORTED_DESC': False},\n",
" 'b': {'SORTED_ASC': False, 'SORTED_DESC': False},\n",
" 'g': {'SORTED_ASC': False, 'SORTED_DESC': False}}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.flags"
]
},
{
"cell_type": "markdown",
"id": "97789e60-d5ae-442f-a7d2-e9e6fdce8f0b",
"metadata": {},
"source": [
"## データフレームからデータ取得"
]
},
{
"cell_type": "markdown",
"id": "bee5cfbd-67ba-4111-94c9-2d0b8100ded8",
"metadata": {},
"source": [
"本節は、データフレームから列、行、或いは単一の値を取得する方法について説明します。"
]
},
{
"cell_type": "markdown",
"id": "cb3fb550-ab0d-4250-b798-986736c28adc",
"metadata": {},
"source": [
"### 列を取得"
]
},
{
"cell_type": "markdown",
"id": "6da287a3-53b2-4e4c-853a-aa216630db8b",
"metadata": {},
"source": [
"PolarsでDataFrameから列データをSeriesとして取得する方法はいくつかあります。\n",
"\n",
"* `DataFrame.to_series()`: インデックスで列を取得します。\n",
"* `DataFrame.get_column()`: 列名で列を取得します。\n",
"* `DataFrame.get_columns()`: すべての列を取得します。\n",
"* `DataFrame.iter_columns()`: 列のイテレーターを取得します。\n",
"\n",
"`DataFrame.to_series()` メソッドを使用すると、指定したインデックスに基づいて列を Series として取得できます。`DataFrame.get_column()` メソッドを使用すると、列名を指定して Series を取得できます。`DataFrame[\"column_name\"]`のような辞書形式で列名を指定して Seriesを取得することもできます。"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "3dc0cb3a-d39d-42c2-bd2b-e0b63a8c851b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"s1 = df.to_series(0)\n",
"s2 = df.get_column('b')\n",
"s3 = df['g']\n",
"row(s1, s2, s3)"
]
},
{
"cell_type": "markdown",
"id": "977a4de8-d3a1-4b90-a5d9-c7cab592670b",
"metadata": {},
"source": [
"`DataFrame.get_columns()` メソッドは、DataFrame 内のすべての列を Series のリストとして取得します。"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "e4d3455e-7ac0-48b2-b2c6-128419babdb6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sa, sb, sg = df.get_columns()\n",
"row(sa, sb, sg)"
]
},
{
"cell_type": "markdown",
"id": "5215980b-6a4d-4d52-8d50-70a20cfb3b2f",
"metadata": {},
"source": [
"`DataFrame.iter_columns()`は、DataFrame内のすべての列を一つずつ返します。"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f1a5d14f-bbf1-4c31-b0ea-8a10aef5d399",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a [3, 3, 3, 4]\n",
"b [4.0, 12.0, 6.0, 7.0]\n",
"g ['A', 'B', 'A', 'B']\n"
]
}
],
"source": [
"for col in df.iter_columns():\n",
" print(col.name, col.to_list())"
]
},
{
"cell_type": "markdown",
"id": "c0ef13fe-d21f-469c-b9b3-524f7bc7f747",
"metadata": {},
"source": [
"### Seriesオブジェクト"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "5b34d4dd-b56a-41ee-9556-1d4483ee7fdd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"s1.name = 'a'\n",
"s1.dtype = Int64\n",
"s1.flags = {'SORTED_ASC': False, 'SORTED_DESC': False}\n",
"s1.shape = (4,)\n",
"len(s1) = 4\n"
]
}
],
"source": [
"print(f\"{s1.name = }\")\n",
"print(f\"{s1.dtype = }\")\n",
"print(f\"{s1.flags = }\")\n",
"print(f\"{s1.shape = }\")\n",
"print(f\"{len(s1) = }\")"
]
},
{
"cell_type": "markdown",
"id": "a462d925-7ca0-41e1-afab-c34f6e8f811c",
"metadata": {},
"source": [
"### to_numpy()メソッド"
]
},
{
"cell_type": "markdown",
"id": "34e926d3-f939-4863-82a6-ecdeffc38be3",
"metadata": {},
"source": [
"`Series.to_numpy()`または`Series.to_list()`メソッドを使用すると、`Series`オブジェクトをNumPyの配列やリストに変換することができます。"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "3a795f08-d08f-48da-9307-8373286b2df9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sa.to_numpy() = array([3, 3, 3, 4], dtype=int64)\n",
"sa.to_list() = [3, 3, 3, 4]\n"
]
}
],
"source": [
"print(f'{sa.to_numpy() = }')\n",
"print(f'{sa.to_list() = }')"
]
},
{
"cell_type": "markdown",
"id": "8fd6ac81-13c8-4e2e-b914-299a874b4e5c",
"metadata": {},
"source": [
"`DataFrame.to_numpy()`でNumPyの配列に変換することができます。デフォルトはすべての値を一番上位のデータ型に変換します。数値と文字列混在のデータの場合は、`dtype=object`の配列になります。"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "0b1927c5-1127-44a8-9a06-dff3b298f1d2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[3, 4.0, 'A'],\n",
" [3, 12.0, 'B'],\n",
" [3, 6.0, 'A'],\n",
" [4, 7.0, 'B']], dtype=object)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.to_numpy()"
]
},
{
"cell_type": "markdown",
"id": "9ac10ad4-0828-49a5-82da-316b55606115",
"metadata": {},
"source": [
"`structured`引数を`True`にすれば、構造化配列に変換します。"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "15b6d758-3880-40bf-8240-4067a676dd59",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([(3, 4., 'A'), (3, 12., 'B'), (3, 6., 'A'), (4, 7., 'B')],\n",
" dtype=[('a', ' | \n",
" shape: (2, 4)a | b | c | d |
---|
i64 | i64 | str | str | 1 | 5 | "x" | "a" | 3 | 7 | "z" | "c" |
| \n",
" shape: (2, 4)a | b | c | d |
---|
i64 | i64 | str | str | 2 | 6 | "y" | "b" | 3 | 7 | "z" | "c" |
|
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"row(df[0], df[[0, 2]], df[1:3])"
]
},
{
"cell_type": "markdown",
"id": "66354118-a328-44d7-9d83-1ddcf04bec78",
"metadata": {},
"source": [
"2. 文字列、文字列のリスト、文字列のスライスの場合: **列を選択**します。スライスの場合、終了値が含まれます。"
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "bce620e0-6579-434d-858c-629a983931f2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" | | \n",
" shape: (4, 3)a | b | c |
---|
i64 | i64 | str | 1 | 5 | "x" | 2 | 6 | "y" | 3 | 7 | "z" | 4 | 8 | "w" |
|
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"row(df[\"a\"], df[[\"a\", \"d\"]], df[\"a\":\"c\"])"
]
},
{
"cell_type": "markdown",
"id": "a5c0e9a9-370a-46b2-b2e4-c5f2ad2ad758",
"metadata": {},
"source": [
"**インデックスが2つの場合**\n",
"\n",
"インデックスが2つ指定される場合、1つ目は**行**の選択、2つ目は**列**の選択に使われます。\n",
"\n",
"- 行のインデックスには整数、整数のリスト、またはスライスを使用します。\n",
"- 列のインデックスには整数、文字列、整数のリスト、文字列のリスト、またはスライスを使用できます。\n",
"\n",
"1. 両方が単一の値の場合: 特定の要素を抽出します。結果はPythonの基本データ型となります。"
]
},
{
"cell_type": "code",
"execution_count": 72,
"id": "9ef45906-1e83-431f-aa14-bbd1374fc2d0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"df[0, 'b'] = 5\n",
"df[0, 1] = 5\n"
]
}
],
"source": [
"print(f\"{df[0, 'b'] = }\") # 行0、列\"b\"の値を抽出\n",
"print(f\"{df[0, 1] = }\") # 行0、列インデックス1の値を抽出"
]
},
{
"cell_type": "markdown",
"id": "efe5010d-615b-4470-aee9-5f3ce20f7091",
"metadata": {},
"source": [
"2. 列のインデックスがリストやスライスの場合: 結果は`DataFrame`オブジェクトになります。"
]
},
{
"cell_type": "code",
"execution_count": 73,
"id": "097e8254-a4cd-4249-959d-b2e1e357aa17",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"row(df[0, [\"a\", \"c\"]], df[1, \"a\":\"c\"], df[2, [0, 1, 3]], df[3, ::2])"
]
},
{
"cell_type": "code",
"execution_count": 74,
"id": "93c1e743-d11c-4bbf-bdfd-196bc927575c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"row(df[[1, 2], [\"a\", \"c\"]], df[[1, 2], \"a\":\"c\"], df[2:, [0, 1, 3]], df[2:, ::2])"
]
},
{
"cell_type": "markdown",
"id": "4897e81a-26eb-4bf5-9108-c8fb007a409e",
"metadata": {},
"source": [
"3. 行のインデックスがリストやスライスで、列のインデックスが単独の要素の場合: **`Series`オブジェクト**が返されます。`Series`は、選択された特定の列の行データを保持します。"
]
},
{
"cell_type": "code",
"execution_count": 75,
"id": "663aa2e2-f9bc-46ca-8c24-db9c9bd224f0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"row(df[[1, 2], \"a\"], df[1:, 2])"
]
},
{
"cell_type": "markdown",
"id": "408ad418-cce6-4ad0-a987-5b0d5f81f4cf",
"metadata": {},
"source": [
"#### 要素の変更\n",
"\n",
"Polarsでは、`[]`演算を使用してDataFrame内のデータを変更することができます。これには以下の2つの方法があります。\n",
"\n",
"1. **列リストでデータを設定する場合**: 列をリストで指定し、それに対応するデータを設定します。設定するデータは、形状が一致するNumPy配列である必要があります。"
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "9788c8b9-347f-42ce-aa4d-a6ab0d1c3a5e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (4, 4)a | b | c | d |
---|
i32 | i64 | str | str |
10 | 5 | "x" | "a" |
20 | 6 | "y" | "b" |
30 | 7 | "z" | "c" |
40 | 8 | "w" | "d" |
"
],
"text/plain": [
"shape: (4, 4)\n",
"┌─────┬─────┬─────┬─────┐\n",
"│ a ┆ b ┆ c ┆ d │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i32 ┆ i64 ┆ str ┆ str │\n",
"╞═════╪═════╪═════╪═════╡\n",
"│ 10 ┆ 5 ┆ x ┆ a │\n",
"│ 20 ┆ 6 ┆ y ┆ b │\n",
"│ 30 ┆ 7 ┆ z ┆ c │\n",
"│ 40 ┆ 8 ┆ w ┆ d │\n",
"└─────┴─────┴─────┴─────┘"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 列\"a\"のデータを設定\n",
"df[[\"a\"]] = np.array([[10], [20], [30], [40]])\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 78,
"id": "88fda4c4-2fa3-42f6-8283-ca610467cb4d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (4, 4)a | b | c | d |
---|
i32 | i32 | str | str |
100 | 50 | "x" | "a" |
200 | 60 | "y" | "b" |
300 | 70 | "z" | "c" |
400 | 80 | "w" | "d" |
"
],
"text/plain": [
"shape: (4, 4)\n",
"┌─────┬─────┬─────┬─────┐\n",
"│ a ┆ b ┆ c ┆ d │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i32 ┆ i32 ┆ str ┆ str │\n",
"╞═════╪═════╪═════╪═════╡\n",
"│ 100 ┆ 50 ┆ x ┆ a │\n",
"│ 200 ┆ 60 ┆ y ┆ b │\n",
"│ 300 ┆ 70 ┆ z ┆ c │\n",
"│ 400 ┆ 80 ┆ w ┆ d │\n",
"└─────┴─────┴─────┴─────┘"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 列\"a\"と列\"b\"のデータを同時に設定\n",
"df[[\"a\", \"b\"]] = np.array([[100, 200, 300, 400], [50, 60, 70, 80]]).T\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "d3e5d9b3-9401-4ed3-a124-0eedb4d7c386",
"metadata": {},
"source": [
"2. 行・列を指定して特定の要素を変更する場合\n",
"\n",
"- **行のインデックス**: 整数または整数のリストで指定します。\n",
"- **列のインデックス**: 整数または文字列で指定します。"
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "e3dc5710-fdb7-43b9-8c9e-f618c3d5b3b5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (4, 4)a | b | c | d |
---|
i32 | i32 | str | str |
100 | 50 | "x" | "a" |
-1 | 60 | "y" | "b" |
300 | 70 | "z" | "c" |
400 | 80 | "w" | "d" |
"
],
"text/plain": [
"shape: (4, 4)\n",
"┌─────┬─────┬─────┬─────┐\n",
"│ a ┆ b ┆ c ┆ d │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i32 ┆ i32 ┆ str ┆ str │\n",
"╞═════╪═════╪═════╪═════╡\n",
"│ 100 ┆ 50 ┆ x ┆ a │\n",
"│ -1 ┆ 60 ┆ y ┆ b │\n",
"│ 300 ┆ 70 ┆ z ┆ c │\n",
"│ 400 ┆ 80 ┆ w ┆ d │\n",
"└─────┴─────┴─────┴─────┘"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 単一の行と列のデータを変更\n",
"df[1, \"a\"] = -1\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "ddc2172e-0442-44c4-a06e-5156eb64534b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (4, 4)a | b | c | d |
---|
i32 | i32 | str | str |
100 | 50 | "x" | "a" |
-1 | -1 | "y" | "b" |
300 | 70 | "z" | "c" |
400 | -2 | "w" | "d" |
"
],
"text/plain": [
"shape: (4, 4)\n",
"┌─────┬─────┬─────┬─────┐\n",
"│ a ┆ b ┆ c ┆ d │\n",
"│ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i32 ┆ i32 ┆ str ┆ str │\n",
"╞═════╪═════╪═════╪═════╡\n",
"│ 100 ┆ 50 ┆ x ┆ a │\n",
"│ -1 ┆ -1 ┆ y ┆ b │\n",
"│ 300 ┆ 70 ┆ z ┆ c │\n",
"│ 400 ┆ -2 ┆ w ┆ d │\n",
"└─────┴─────┴─────┴─────┘"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 複数の行のデータを変更\n",
"df[[1, 3], \"b\"] = -1, -2\n",
"df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}