{ "cells": [ { "cell_type": "markdown", "id": "85070315-e811-42e8-a0a0-c8b3c406dab6", "metadata": {}, "source": [ "# データフレーム" ] }, { "cell_type": "markdown", "id": "c81bd7ed-68a5-49ac-96da-734973786932", "metadata": {}, "source": [ "データフレーム(DataFrame)はPolarsの中心的なデータ構造で、行と列からなる2次元のデータを表現します。各列は同じデータ型を持ち、列ごとに型が異なることが可能です。" ] }, { "cell_type": "code", "execution_count": 2, "id": "c7385efc-9537-4d82-8e7d-713f5d18ceb8", "metadata": {}, "outputs": [], "source": [ "import polars as pl\n", "from helper.jupyter import row" ] }, { "cell_type": "markdown", "id": "ee1d168a-4f1d-4059-893c-4cbd4a9c298a", "metadata": {}, "source": [ "## データフレームの作成\n", "\n", "本節では、Pythonの他のオブジェクトからデータフレームオブジェクトを作成する方法について説明します。" ] }, { "cell_type": "markdown", "id": "ecce10ba-a945-4330-ba37-dec30370210f", "metadata": {}, "source": [ "### リストと辞書の組み合わせ" ] }, { "cell_type": "markdown", "id": "bf1572ab-1637-4453-b39f-e50600550191", "metadata": {}, "source": [ "次のプログラムは、異なるPythonのデータ構造(辞書のリスト、リストの辞書、リストのリスト)を使用して`DataFrame`を作成します。\n", "\n", "1. 辞書のリスト(`list[dict]`): 辞書のキーが列名となり、リスト内の各辞書が1行分のデータとなります。\n", "2. リストの辞書(`dict[list]`): 辞書のキーが列名となり、各リストの要素がその列に対応するデータになります。\n", "3. リストのリスト(`list[list]`): `schema`引数で列名を指定します。\n", " * `orient`引数は`'row'`の場合は、データの方向が行単位であることを指定して、内部の一つリストが1行分のデータとなります。\n", " * `orient`引数は`'col'`の場合は、データの方向が列単位であることを指定して、内部の一つリストが1列分のデータとなります。この例では内部リストのデータ型一致しないので、`strict=False`で自動型変換を有効にします。" ] }, { "cell_type": "code", "execution_count": 3, "id": "1f3ce2fa-510e-4be0-a758-43d841ea8526", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (3, 2)
nameage
stri64
"Alice"30
"Bob"25
"Charlie"35
\n", "shape: (3, 2)
nameage
stri64
"Alice"30
"Bob"25
"Charlie"35
\n", "shape: (3, 2)
nameage
stri64
"Alice"30
"Bob"25
"Charlie"35
\n", "shape: (2, 3)
p1p2p3
strstrstr
"Alice""Bob""Charlie"
"30""25""35"
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# list[dict]\n", "dict_in_list = [\n", " {\"name\": \"Alice\", \"age\": 30},\n", " {\"name\": \"Bob\", \"age\": 25},\n", " {\"name\": \"Charlie\", \"age\": 35}\n", "]\n", "df1 = pl.DataFrame(dict_in_list)\n", "\n", "# dict[list]\n", "list_in_dict = {\n", " \"name\": [\"Alice\", \"Bob\", \"Charlie\"],\n", " \"age\": [30, 25, 35],\n", "}\n", "df2 = pl.DataFrame(list_in_dict)\n", "\n", "# list[list]\n", "list_in_list = [\n", " [\"Alice\", 30],\n", " [\"Bob\", 25],\n", " [\"Charlie\", 35]\n", "]\n", "columns = [\"name\", \"age\"] # カラム名を指定\n", "df3 = pl.DataFrame(list_in_list, schema=columns, orient='row')\n", "df4 = pl.DataFrame(list_in_list, schema=['p1', 'p2', 'p3'], orient='col', strict=False)\n", "row(df1, df2, df3, df4)" ] }, { "cell_type": "markdown", "id": "84ad5590-8224-4d0e-b55b-77d302fc6c81", "metadata": {}, "source": [ "次の`data` は辞書形式で、次のようなデータを持っています:\n", "\n", " - `\"point\"` キーの値は辞書のリスト (`list[dict]`) で、各辞書には`x` と `y` の2つのキーが含まれています。\n", " - `\"weight\"` キーの値は整数のリスト (`list[int]`) です。\n", "\n", "データフレームに変換するとき、外側の辞書のキーは列名になり、`point`列の要素は`Struct`型(構造体)に変換されます。" ] }, { "cell_type": "code", "execution_count": 4, "id": "9f14e575-4397-4c4b-859f-db6c3970138b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (3, 2)
pointweight
struct[2]i64
{1,2}5
{3,4}4
{5,6}8
" ], "text/plain": [ "shape: (3, 2)\n", "┌───────────┬────────┐\n", "│ point ┆ weight │\n", "│ --- ┆ --- │\n", "│ struct[2] ┆ i64 │\n", "╞═══════════╪════════╡\n", "│ {1,2} ┆ 5 │\n", "│ {3,4} ┆ 4 │\n", "│ {5,6} ┆ 8 │\n", "└───────────┴────────┘" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = {\n", " \"point\": [{\"x\": 1, \"y\": 2}, {\"x\": 3, \"y\": 4}, {\"x\": 5, \"y\": 6}],\n", " \"weight\": [5, 4, 8],\n", "}\n", "pl.DataFrame(data)" ] }, { "cell_type": "markdown", "id": "68739442-76a4-49cf-84de-3e666235504c", "metadata": {}, "source": [ "### NumPyの配列" ] }, { "cell_type": "markdown", "id": "89dcd835-c290-4587-a174-7543ba5c9b32", "metadata": {}, "source": [ "NumPyの配列を扱う際、以下のようにlistとNumPy配列と互換性を持ちます。\n", "\n", "* `dict[list]`と`dict[1次元配列]`は同じ扱い\n", "* `list[list]`と2次元配列は同じ扱い" ] }, { "cell_type": "code", "execution_count": 5, "id": "c761d1cb-c5d3-4669-ad59-da095ce5e16f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (3, 2)
xy
i32i32
12
34
56
\n", "shape: (3, 2)
xy
i32i32
12
34
56
\n", "shape: (2, 3)
p1p2p3
i32i32i32
135
246
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "array_in_dict = {\n", " \"x\": np.array([1, 3, 5]),\n", " \"y\": np.array([2, 4, 6]),\n", "}\n", "\n", "df1 = pl.DataFrame(array_in_dict)\n", "\n", "array_2d = np.array([[1, 2], [3, 4], [5, 6]])\n", "df2 = pl.DataFrame(array_2d, schema=['x', 'y'], orient='row')\n", "df3 = pl.DataFrame(array_2d, schema=['p1', 'p2', 'p3'], orient='col')\n", "row(df1, df2, df3)" ] }, { "cell_type": "markdown", "id": "f21f1c4f-4539-4970-91f2-af6ca3cb244c", "metadata": {}, "source": [ "1次元の構造化配列をデータフレームに変換する場合は、配列の各フィールドはデータフレームの各列になります。" ] }, { "cell_type": "code", "execution_count": 6, "id": "795cbc29-23a2-4dc2-9001-12d34cd35c67", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (3, 2)
xy
i16i16
130
225
335
" ], "text/plain": [ "shape: (3, 2)\n", "┌─────┬─────┐\n", "│ x ┆ y │\n", "│ --- ┆ --- │\n", "│ i16 ┆ i16 │\n", "╞═════╪═════╡\n", "│ 1 ┆ 30 │\n", "│ 2 ┆ 25 │\n", "│ 3 ┆ 35 │\n", "└─────┴─────┘" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([\n", " (1, 30),\n", " (2, 25),\n", " (3, 35)], dtype=[('x', 'i2'), ('y', 'i2')])\n", "\n", "pl.DataFrame(arr)" ] }, { "cell_type": "markdown", "id": "c3e7beb0-be74-4f65-aa78-4f9c0c0b5a69", "metadata": {}, "source": [ "### Seriesを含むデータ" ] }, { "cell_type": "markdown", "id": "76949b0a-54b6-4134-9828-063d1ad1993d", "metadata": {}, "source": [ "`pl.Series` を扱う場合、`list[Series]` や `dict[Series]` の形式をデータフレームに変換することがよくあります。どちらの場合も、それぞれの `Series` はデータフレームの列になりますが、列名の扱いが異なります。\n", "\n", "- `list[Series]`: 列名は `Series` の名前がそのまま使われます。\n", "- `dict[Series]`: 列名は辞書のキーが使われます。" ] }, { "cell_type": "code", "execution_count": 7, "id": "a3fe401b-d85c-4059-829f-6142f97d746e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (3, 2)
xy
i64i64
14
25
36
\n", "shape: (3, 2)
AB
i64i64
14
25
36
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sx = pl.Series('x', [1, 2, 3])\n", "sy = pl.Series('y', [4, 5, 6])\n", "\n", "df1 = pl.DataFrame([sx, sy])\n", "df2 = pl.DataFrame({'A':sx, 'B':sy})\n", "row(df1, df2)" ] }, { "cell_type": "markdown", "id": "a757d2c6-a311-4162-89ea-99b43e1e7ac1", "metadata": {}, "source": [ "### pl.from_*()関数" ] }, { "cell_type": "markdown", "id": "add40361-d2d0-449c-9e35-bb982e9d5258", "metadata": {}, "source": [ "`from_` で始まる関数は、さまざまなデータ型をデータフレームに変換するために使用されます。これらの関数を利用すると、意図しないデータ変換が発生しにくく、コードのロバスト性を向上させることができます。\n", "\n", "- `pl.from_dict()`: `dict[list]` のデータから変換\n", "- `pl.from_dicts()`: `list[dict]` のデータから変換\n", "- `pl.from_numpy()`: NumPy の配列から変換\n", "- `pl.from_records()`: `list[list]` のデータから変換\n", "- `pl.from_pandas()`: Pandasの`DataFrame`オブジェクトから変換\n", "- `pl.from_arrow()`: pyarrowの`Array`或いは`Table`オブジェクトから変換" ] }, { "cell_type": "markdown", "id": "bc4968b1-6e92-44ae-94f4-a5073c90e4bb", "metadata": {}, "source": [ "## データフレームの属性" ] }, { "cell_type": "code", "execution_count": 8, "id": "e3fca09d-55b2-48ff-8bdb-7574cbdc9b7e", "metadata": {}, "outputs": [], "source": [ "df = pl.DataFrame(\n", " {\n", " \"a\": [3, 3, 3, 4],\n", " \"b\": [4.0, 12, 6, 7],\n", " \"g\": ['A', 'B', 'A', 'B']\n", " }\n", ")" ] }, { "cell_type": "markdown", "id": "71c1bb23-cb89-4a3d-8181-8b674d09b574", "metadata": {}, "source": [ "`shape`属性でデータフレームの形状(高さ、幅)を取得できます。又`height`と`width`属性で高さと幅を取得することもできます。`len()`関数でも高さを取得できます。" ] }, { "cell_type": "code", "execution_count": 9, "id": "bab3d3ae-fe6a-4d2d-9d29-ed522cf29614", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "df.shape = (4, 3)\n", "df.height = 4\n", "df.width = 3\n", "len(df) = 4\n" ] } ], "source": [ "print(f\"{df.shape = }\")\n", "print(f\"{df.height = }\")\n", "print(f\"{df.width = }\")\n", "print(f\"{len(df) = }\")" ] }, { "cell_type": "markdown", "id": "70e1a124-87d0-46fc-9e16-b9a9b49e410f", "metadata": {}, "source": [ "`columns`属性で列名のリストを取得できます。" ] }, { "cell_type": "code", "execution_count": 10, "id": "b0351aa8-a032-442d-9751-14ba1f4f879d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['a', 'b', 'g']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "id": "8f0e1417-b9b0-4d2d-94c2-ba921ff471ee", "metadata": {}, "source": [ "`schema`属性で列名と列のデータ型を保存する辞書オブジェクトを取得できます。" ] }, { "cell_type": "code", "execution_count": 11, "id": "c99da371-503f-4bf4-be58-def65766d4fd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Schema([('a', Int64), ('b', Float64), ('g', String)])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.schema" ] }, { "cell_type": "code", "execution_count": 12, "id": "05840e3d-318b-44b8-82fd-528ad7e2d28f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "String" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.schema['g']" ] }, { "cell_type": "markdown", "id": "54e739c7-ff40-42e1-ad4c-4079244db783", "metadata": {}, "source": [ "`dtypes`属性で、各個列のデータ型を保存するリストを取得できます。" ] }, { "cell_type": "code", "execution_count": 13, "id": "d5c33b09-4580-41c9-963f-e34ff051e804", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Int64, Float64, String]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "id": "2d66602f-2c12-4b35-8b33-002fa668a37e", "metadata": {}, "source": [ "`flags`属性で各個列のソート状態を取得できます。" ] }, { "cell_type": "code", "execution_count": 14, "id": "0a654663-e837-4adc-a834-ce760f52f50e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a': {'SORTED_ASC': False, 'SORTED_DESC': False},\n", " 'b': {'SORTED_ASC': False, 'SORTED_DESC': False},\n", " 'g': {'SORTED_ASC': False, 'SORTED_DESC': False}}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.flags" ] }, { "cell_type": "markdown", "id": "97789e60-d5ae-442f-a7d2-e9e6fdce8f0b", "metadata": {}, "source": [ "## データフレームからデータ取得" ] }, { "cell_type": "markdown", "id": "bee5cfbd-67ba-4111-94c9-2d0b8100ded8", "metadata": {}, "source": [ "本節は、データフレームから列、行、或いは単一の値を取得する方法について説明します。" ] }, { "cell_type": "markdown", "id": "cb3fb550-ab0d-4250-b798-986736c28adc", "metadata": {}, "source": [ "### 列を取得" ] }, { "cell_type": "markdown", "id": "6da287a3-53b2-4e4c-853a-aa216630db8b", "metadata": {}, "source": [ "PolarsでDataFrameから列データをSeriesとして取得する方法はいくつかあります。\n", "\n", "* `DataFrame.to_series()`: インデックスで列を取得します。\n", "* `DataFrame.get_column()`: 列名で列を取得します。\n", "* `DataFrame.get_columns()`: すべての列を取得します。\n", "* `DataFrame.iter_columns()`: 列のイテレーターを取得します。\n", "\n", "`DataFrame.to_series()` メソッドを使用すると、指定したインデックスに基づいて列を Series として取得できます。`DataFrame.get_column()` メソッドを使用すると、列名を指定して Series を取得できます。`DataFrame[\"column_name\"]`のような辞書形式で列名を指定して Seriesを取得することもできます。" ] }, { "cell_type": "code", "execution_count": 15, "id": "3dc0cb3a-d39d-42c2-bd2b-e0b63a8c851b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4,)
a
i64
3
3
3
4
\n", "shape: (4,)
b
f64
4.0
12.0
6.0
7.0
\n", "shape: (4,)
g
str
"A"
"B"
"A"
"B"
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "s1 = df.to_series(0)\n", "s2 = df.get_column('b')\n", "s3 = df['g']\n", "row(s1, s2, s3)" ] }, { "cell_type": "markdown", "id": "977a4de8-d3a1-4b90-a5d9-c7cab592670b", "metadata": {}, "source": [ "`DataFrame.get_columns()` メソッドは、DataFrame 内のすべての列を Series のリストとして取得します。" ] }, { "cell_type": "code", "execution_count": 16, "id": "e4d3455e-7ac0-48b2-b2c6-128419babdb6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4,)
a
i64
3
3
3
4
\n", "shape: (4,)
b
f64
4.0
12.0
6.0
7.0
\n", "shape: (4,)
g
str
"A"
"B"
"A"
"B"
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sa, sb, sg = df.get_columns()\n", "row(sa, sb, sg)" ] }, { "cell_type": "markdown", "id": "5215980b-6a4d-4d52-8d50-70a20cfb3b2f", "metadata": {}, "source": [ "`DataFrame.iter_columns()`は、DataFrame内のすべての列を一つずつ返します。" ] }, { "cell_type": "code", "execution_count": 17, "id": "f1a5d14f-bbf1-4c31-b0ea-8a10aef5d399", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a [3, 3, 3, 4]\n", "b [4.0, 12.0, 6.0, 7.0]\n", "g ['A', 'B', 'A', 'B']\n" ] } ], "source": [ "for col in df.iter_columns():\n", " print(col.name, col.to_list())" ] }, { "cell_type": "markdown", "id": "c0ef13fe-d21f-469c-b9b3-524f7bc7f747", "metadata": {}, "source": [ "### Seriesオブジェクト" ] }, { "cell_type": "code", "execution_count": 18, "id": "5b34d4dd-b56a-41ee-9556-1d4483ee7fdd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s1.name = 'a'\n", "s1.dtype = Int64\n", "s1.flags = {'SORTED_ASC': False, 'SORTED_DESC': False}\n", "s1.shape = (4,)\n", "len(s1) = 4\n" ] } ], "source": [ "print(f\"{s1.name = }\")\n", "print(f\"{s1.dtype = }\")\n", "print(f\"{s1.flags = }\")\n", "print(f\"{s1.shape = }\")\n", "print(f\"{len(s1) = }\")" ] }, { "cell_type": "markdown", "id": "a462d925-7ca0-41e1-afab-c34f6e8f811c", "metadata": {}, "source": [ "### to_numpy()メソッド" ] }, { "cell_type": "markdown", "id": "34e926d3-f939-4863-82a6-ecdeffc38be3", "metadata": {}, "source": [ "`Series.to_numpy()`または`Series.to_list()`メソッドを使用すると、`Series`オブジェクトをNumPyの配列やリストに変換することができます。" ] }, { "cell_type": "code", "execution_count": 19, "id": "3a795f08-d08f-48da-9307-8373286b2df9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sa.to_numpy() = array([3, 3, 3, 4], dtype=int64)\n", "sa.to_list() = [3, 3, 3, 4]\n" ] } ], "source": [ "print(f'{sa.to_numpy() = }')\n", "print(f'{sa.to_list() = }')" ] }, { "cell_type": "markdown", "id": "8fd6ac81-13c8-4e2e-b914-299a874b4e5c", "metadata": {}, "source": [ "`DataFrame.to_numpy()`でNumPyの配列に変換することができます。デフォルトはすべての値を一番上位のデータ型に変換します。数値と文字列混在のデータの場合は、`dtype=object`の配列になります。" ] }, { "cell_type": "code", "execution_count": 20, "id": "0b1927c5-1127-44a8-9a06-dff3b298f1d2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3, 4.0, 'A'],\n", " [3, 12.0, 'B'],\n", " [3, 6.0, 'A'],\n", " [4, 7.0, 'B']], dtype=object)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.to_numpy()" ] }, { "cell_type": "markdown", "id": "9ac10ad4-0828-49a5-82da-316b55606115", "metadata": {}, "source": [ "`structured`引数を`True`にすれば、構造化配列に変換します。" ] }, { "cell_type": "code", "execution_count": 21, "id": "15b6d758-3880-40bf-8240-4067a676dd59", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([(3, 4., 'A'), (3, 12., 'B'), (3, 6., 'A'), (4, 7., 'B')],\n", " dtype=[('a', '
\n", "shape: (1, 4)
abcd
i64i64strstr
15"x""a"
\n", "shape: (2, 4)
abcd
i64i64strstr
15"x""a"
37"z""c"
\n", "shape: (2, 4)
abcd
i64i64strstr
26"y""b"
37"z""c"
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "row(df[0], df[[0, 2]], df[1:3])" ] }, { "cell_type": "markdown", "id": "66354118-a328-44d7-9d83-1ddcf04bec78", "metadata": {}, "source": [ "2. 文字列、文字列のリスト、文字列のスライスの場合: **列を選択**します。スライスの場合、終了値が含まれます。" ] }, { "cell_type": "code", "execution_count": 71, "id": "bce620e0-6579-434d-858c-629a983931f2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4,)
a
i64
1
2
3
4
\n", "shape: (4, 2)
ad
i64str
1"a"
2"b"
3"c"
4"d"
\n", "shape: (4, 3)
abc
i64i64str
15"x"
26"y"
37"z"
48"w"
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "row(df[\"a\"], df[[\"a\", \"d\"]], df[\"a\":\"c\"])" ] }, { "cell_type": "markdown", "id": "a5c0e9a9-370a-46b2-b2e4-c5f2ad2ad758", "metadata": {}, "source": [ "**インデックスが2つの場合**\n", "\n", "インデックスが2つ指定される場合、1つ目は**行**の選択、2つ目は**列**の選択に使われます。\n", "\n", "- 行のインデックスには整数、整数のリスト、またはスライスを使用します。\n", "- 列のインデックスには整数、文字列、整数のリスト、文字列のリスト、またはスライスを使用できます。\n", "\n", "1. 両方が単一の値の場合: 特定の要素を抽出します。結果はPythonの基本データ型となります。" ] }, { "cell_type": "code", "execution_count": 72, "id": "9ef45906-1e83-431f-aa14-bbd1374fc2d0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "df[0, 'b'] = 5\n", "df[0, 1] = 5\n" ] } ], "source": [ "print(f\"{df[0, 'b'] = }\") # 行0、列\"b\"の値を抽出\n", "print(f\"{df[0, 1] = }\") # 行0、列インデックス1の値を抽出" ] }, { "cell_type": "markdown", "id": "efe5010d-615b-4470-aee9-5f3ce20f7091", "metadata": {}, "source": [ "2. 列のインデックスがリストやスライスの場合: 結果は`DataFrame`オブジェクトになります。" ] }, { "cell_type": "code", "execution_count": 73, "id": "097e8254-a4cd-4249-959d-b2e1e357aa17", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (1, 2)
ac
i64str
1"x"
\n", "shape: (1, 3)
abc
i64i64str
26"y"
\n", "shape: (1, 3)
abd
i64i64str
37"c"
\n", "shape: (1, 2)
ac
i64str
4"w"
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "row(df[0, [\"a\", \"c\"]], df[1, \"a\":\"c\"], df[2, [0, 1, 3]], df[3, ::2])" ] }, { "cell_type": "code", "execution_count": 74, "id": "93c1e743-d11c-4bbf-bdfd-196bc927575c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (2, 2)
ac
i64str
2"y"
3"z"
\n", "shape: (2, 3)
abc
i64i64str
26"y"
37"z"
\n", "shape: (2, 3)
abd
i64i64str
37"c"
48"d"
\n", "shape: (2, 2)
ac
i64str
3"z"
4"w"
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "row(df[[1, 2], [\"a\", \"c\"]], df[[1, 2], \"a\":\"c\"], df[2:, [0, 1, 3]], df[2:, ::2])" ] }, { "cell_type": "markdown", "id": "4897e81a-26eb-4bf5-9108-c8fb007a409e", "metadata": {}, "source": [ "3. 行のインデックスがリストやスライスで、列のインデックスが単独の要素の場合: **`Series`オブジェクト**が返されます。`Series`は、選択された特定の列の行データを保持します。" ] }, { "cell_type": "code", "execution_count": 75, "id": "663aa2e2-f9bc-46ca-8c24-db9c9bd224f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (2,)
a
i64
2
3
\n", "shape: (3,)
c
str
"y"
"z"
"w"
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "row(df[[1, 2], \"a\"], df[1:, 2])" ] }, { "cell_type": "markdown", "id": "408ad418-cce6-4ad0-a987-5b0d5f81f4cf", "metadata": {}, "source": [ "#### 要素の変更\n", "\n", "Polarsでは、`[]`演算を使用してDataFrame内のデータを変更することができます。これには以下の2つの方法があります。\n", "\n", "1. **列リストでデータを設定する場合**: 列をリストで指定し、それに対応するデータを設定します。設定するデータは、形状が一致するNumPy配列である必要があります。" ] }, { "cell_type": "code", "execution_count": 77, "id": "9788c8b9-347f-42ce-aa4d-a6ab0d1c3a5e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4, 4)
abcd
i32i64strstr
105"x""a"
206"y""b"
307"z""c"
408"w""d"
" ], "text/plain": [ "shape: (4, 4)\n", "┌─────┬─────┬─────┬─────┐\n", "│ a ┆ b ┆ c ┆ d │\n", "│ --- ┆ --- ┆ --- ┆ --- │\n", "│ i32 ┆ i64 ┆ str ┆ str │\n", "╞═════╪═════╪═════╪═════╡\n", "│ 10 ┆ 5 ┆ x ┆ a │\n", "│ 20 ┆ 6 ┆ y ┆ b │\n", "│ 30 ┆ 7 ┆ z ┆ c │\n", "│ 40 ┆ 8 ┆ w ┆ d │\n", "└─────┴─────┴─────┴─────┘" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 列\"a\"のデータを設定\n", "df[[\"a\"]] = np.array([[10], [20], [30], [40]])\n", "df" ] }, { "cell_type": "code", "execution_count": 78, "id": "88fda4c4-2fa3-42f6-8283-ca610467cb4d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4, 4)
abcd
i32i32strstr
10050"x""a"
20060"y""b"
30070"z""c"
40080"w""d"
" ], "text/plain": [ "shape: (4, 4)\n", "┌─────┬─────┬─────┬─────┐\n", "│ a ┆ b ┆ c ┆ d │\n", "│ --- ┆ --- ┆ --- ┆ --- │\n", "│ i32 ┆ i32 ┆ str ┆ str │\n", "╞═════╪═════╪═════╪═════╡\n", "│ 100 ┆ 50 ┆ x ┆ a │\n", "│ 200 ┆ 60 ┆ y ┆ b │\n", "│ 300 ┆ 70 ┆ z ┆ c │\n", "│ 400 ┆ 80 ┆ w ┆ d │\n", "└─────┴─────┴─────┴─────┘" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 列\"a\"と列\"b\"のデータを同時に設定\n", "df[[\"a\", \"b\"]] = np.array([[100, 200, 300, 400], [50, 60, 70, 80]]).T\n", "df" ] }, { "cell_type": "markdown", "id": "d3e5d9b3-9401-4ed3-a124-0eedb4d7c386", "metadata": {}, "source": [ "2. 行・列を指定して特定の要素を変更する場合\n", "\n", "- **行のインデックス**: 整数または整数のリストで指定します。\n", "- **列のインデックス**: 整数または文字列で指定します。" ] }, { "cell_type": "code", "execution_count": 79, "id": "e3dc5710-fdb7-43b9-8c9e-f618c3d5b3b5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4, 4)
abcd
i32i32strstr
10050"x""a"
-160"y""b"
30070"z""c"
40080"w""d"
" ], "text/plain": [ "shape: (4, 4)\n", "┌─────┬─────┬─────┬─────┐\n", "│ a ┆ b ┆ c ┆ d │\n", "│ --- ┆ --- ┆ --- ┆ --- │\n", "│ i32 ┆ i32 ┆ str ┆ str │\n", "╞═════╪═════╪═════╪═════╡\n", "│ 100 ┆ 50 ┆ x ┆ a │\n", "│ -1 ┆ 60 ┆ y ┆ b │\n", "│ 300 ┆ 70 ┆ z ┆ c │\n", "│ 400 ┆ 80 ┆ w ┆ d │\n", "└─────┴─────┴─────┴─────┘" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 単一の行と列のデータを変更\n", "df[1, \"a\"] = -1\n", "df" ] }, { "cell_type": "code", "execution_count": 81, "id": "ddc2172e-0442-44c4-a06e-5156eb64534b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4, 4)
abcd
i32i32strstr
10050"x""a"
-1-1"y""b"
30070"z""c"
400-2"w""d"
" ], "text/plain": [ "shape: (4, 4)\n", "┌─────┬─────┬─────┬─────┐\n", "│ a ┆ b ┆ c ┆ d │\n", "│ --- ┆ --- ┆ --- ┆ --- │\n", "│ i32 ┆ i32 ┆ str ┆ str │\n", "╞═════╪═════╪═════╪═════╡\n", "│ 100 ┆ 50 ┆ x ┆ a │\n", "│ -1 ┆ -1 ┆ y ┆ b │\n", "│ 300 ┆ 70 ┆ z ┆ c │\n", "│ 400 ┆ -2 ┆ w ┆ d │\n", "└─────┴─────┴─────┴─────┘" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 複数の行のデータを変更\n", "df[[1, 3], \"b\"] = -1, -2\n", "df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 5 }