{ "cells": [ { "cell_type": "markdown", "id": "df5dc78a-4115-475d-b55b-67fd9eafbe84", "metadata": {}, "source": [ "# 未解決の課題" ] }, { "cell_type": "markdown", "id": "58c08d47-6997-407d-8e74-4fe8416b4b61", "metadata": {}, "source": [ "本章では、Polars において実装が困難である、または未解決の課題をまとめています。" ] }, { "cell_type": "code", "execution_count": 1, "id": "df022bf5-5251-4c30-a0b0-ca5494b8a42e", "metadata": {}, "outputs": [], "source": [ "import polars as pl" ] }, { "cell_type": "markdown", "id": "957601e2-c311-41dc-8a2f-24b131957f84", "metadata": {}, "source": [ "## scatter\n", "\n", "`Series.scatter()`はSeriesオブジェクトをその場で修正\n", "\n", "https://github.com/pola-rs/polars/issues/17332" ] }, { "cell_type": "code", "execution_count": 2, "id": "6d977889-0d5a-4961-9a9a-7e79eeedfc2b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape: (5,)\n", "Series: '' [i64]\n", "[\n", "\t99\n", "\t2\n", "\t3\n", "\t4\n", "\t5\n", "]\n" ] } ], "source": [ "s = pl.Series([1, 2, 3, 4, 5])\n", "s.scatter(0, 99)\n", "print(s)" ] }, { "cell_type": "markdown", "id": "9fbc67aa-ca3a-4a50-bdea-5979df4489b6", "metadata": {}, "source": [ "`Expr.scatter()`が欲しいです。これがあれば、次の処理が簡単になります。\n", "\n", "https://github.com/pola-rs/polars/issues/13087" ] }, { "cell_type": "code", "execution_count": 35, "id": "41266474-023e-4c45-9d20-e6ea4a33af4b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (5, 3)
ABC
i64i64i64
101
25100
39210
424
510320
" ], "text/plain": [ "shape: (5, 3)\n", "┌─────┬─────┬─────┐\n", "│ A ┆ B ┆ C │\n", "│ --- ┆ --- ┆ --- │\n", "│ i64 ┆ i64 ┆ i64 │\n", "╞═════╪═════╪═════╡\n", "│ 1 ┆ 0 ┆ 1 │\n", "│ 2 ┆ 5 ┆ 100 │\n", "│ 3 ┆ 9 ┆ 210 │\n", "│ 4 ┆ 2 ┆ 4 │\n", "│ 5 ┆ 10 ┆ 320 │\n", "└─────┴─────┴─────┘" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pl.DataFrame(dict(\n", " A=[1, 2, 3, 4, 5],\n", " B=[0, 5, 9, 2, 10],\n", "))\n", "\n", "def set_elements(cols):\n", " a, b = cols\n", " return a.scatter((a < b).arg_true(), [100, 210, 320])\n", "\n", "df2 = df.with_columns(\n", " pl.map_batches(['A', 'B'], set_elements).alias('C')\n", ")\n", "df2" ] }, { "cell_type": "markdown", "id": "a0d953a2-3e43-46f1-815f-feebe725c00e", "metadata": {}, "source": [ "次は`set_by_mask()`を`gather()`で実装します。" ] }, { "cell_type": "code", "execution_count": 65, "id": "4ed79352-d256-4a7c-a219-898a4eda63b2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (5, 3)
ABC
i64i64i64
101
25100
39200
424
510300
" ], "text/plain": [ "shape: (5, 3)\n", "┌─────┬─────┬─────┐\n", "│ A ┆ B ┆ C │\n", "│ --- ┆ --- ┆ --- │\n", "│ i64 ┆ i64 ┆ i64 │\n", "╞═════╪═════╪═════╡\n", "│ 1 ┆ 0 ┆ 1 │\n", "│ 2 ┆ 5 ┆ 100 │\n", "│ 3 ┆ 9 ┆ 200 │\n", "│ 4 ┆ 2 ┆ 4 │\n", "│ 5 ┆ 10 ┆ 300 │\n", "└─────┴─────┴─────┘" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "def set_by_mask(old_values, cond_expr, new_values):\n", " if isinstance(new_values, (tuple, list)):\n", " new_values = pl.lit(new_values).explode()\n", " elif isinstance(new_values, (np.ndarray, pl.Series)):\n", " new_values = pl.lit(new_values)\n", " \n", " return new_values.gather(pl.when(cond_expr).then(cond_expr.cum_sum()).otherwise(None) - 1).fill_null(old_values)\n", "\n", "df.with_columns(C=set_by_mask(pl.col('A'), pl.col('A') < pl.col('B'), np.array([100, 200, 300])))" ] }, { "cell_type": "code", "execution_count": 66, "id": "976c708e-df63-4982-850e-7916db697ec2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 2, 200, 4, 100, 6])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.array([1, 2, 3, 4, 5, 6])\n", "index =[4, 2]\n", "value = [100, 200]\n", "arr[index] = value\n", "arr" ] }, { "cell_type": "code", "execution_count": null, "id": "6204b2a1-5566-4a7a-ae41-d00fcd81e874", "metadata": {}, "outputs": [], "source": [ "[None, None, 1, None, 0, None]" ] }, { "cell_type": "markdown", "id": "42164b18-2763-4927-b0e0-1aa212fe67e2", "metadata": {}, "source": [ "## rolling ignore NULL" ] }, { "cell_type": "markdown", "id": "e27bf981-8b59-48ea-82a6-417d34f732e2", "metadata": {}, "source": [ "`rolling_*()`はNULLに当たると、結果はNULLになります。" ] }, { "cell_type": "code", "execution_count": 12, "id": "9d140d60-586f-4693-909a-4d1b5885bd35", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (5, 1)
A
f64
null
null
null
2.5
1.5
" ], "text/plain": [ "shape: (5, 1)\n", "┌──────┐\n", "│ A │\n", "│ --- │\n", "│ f64 │\n", "╞══════╡\n", "│ null │\n", "│ null │\n", "│ null │\n", "│ 2.5 │\n", "│ 1.5 │\n", "└──────┘" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import polars as pl\n", "\n", "df = pl.DataFrame(\n", " {\n", " \"A\": [5, None, 3, 2, 1],\n", " \"B\": [5, 3, None, 2, 1],\n", " \"C\": [None, None, None, None, None],\n", " }\n", ")\n", "\n", "df.select(pl.col('A').rolling_mean(2))" ] }, { "cell_type": "markdown", "id": "8beddc44-970f-4c00-b104-0fdde2945afe", "metadata": {}, "source": [ "次のコードはNULLではないデータに対して、`rolling_mean()`を計算し、元のNULLと結合します。" ] }, { "cell_type": "code", "execution_count": null, "id": "d066152c-b8e5-45ff-aec7-17681322e542", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (5, 6)
ABCA.meanB.meanC.mean
i64i64nullf64f64f64
55nullnullnullnull
null3nullnull4.0null
3nullnull4.0nullnull
22null2.52.5null
11null1.51.5null
" ], "text/plain": [ "shape: (5, 6)\n", "┌──────┬──────┬──────┬────────┬────────┬────────┐\n", "│ A ┆ B ┆ C ┆ A.mean ┆ B.mean ┆ C.mean │\n", "│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ i64 ┆ i64 ┆ null ┆ f64 ┆ f64 ┆ f64 │\n", "╞══════╪══════╪══════╪════════╪════════╪════════╡\n", "│ 5 ┆ 5 ┆ null ┆ null ┆ null ┆ null │\n", "│ null ┆ 3 ┆ null ┆ null ┆ 4.0 ┆ null │\n", "│ 3 ┆ null ┆ null ┆ 4.0 ┆ null ┆ null │\n", "│ 2 ┆ 2 ┆ null ┆ 2.5 ┆ 2.5 ┆ null │\n", "│ 1 ┆ 1 ┆ null ┆ 1.5 ┆ 1.5 ┆ null │\n", "└──────┴──────┴──────┴────────┴────────┴────────┘" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_res = df.with_columns(\n", " pl.col(\"A\", \"B\", \"C\")\n", " .rolling_mean(2)\n", " .over(pl.col(\"A\", \"B\", \"C\").is_null())\n", " .name.suffix('.mean')\n", ")\n", "df_res" ] }, { "cell_type": "markdown", "id": "18b7d1df-34a4-48d1-b0ed-1f0fd038b907", "metadata": {}, "source": [ "次のコードは`rolling()`で、演算式を窓口に適用します。`.mean()`はNULL無視できます。この場合は`index`列が必要です。" ] }, { "cell_type": "code", "execution_count": 10, "id": "8121fa9c-c0cd-443d-8722-772d39953f63", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (5, 1)
A
f64
5.0
5.0
3.0
2.5
1.5
" ], "text/plain": [ "shape: (5, 1)\n", "┌─────┐\n", "│ A │\n", "│ --- │\n", "│ f64 │\n", "╞═════╡\n", "│ 5.0 │\n", "│ 5.0 │\n", "│ 3.0 │\n", "│ 2.5 │\n", "│ 1.5 │\n", "└─────┘" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.with_row_index().select(\n", " pl.col('A').mean().rolling('index', period='2i')\n", ")" ] }, { "cell_type": "markdown", "id": "a6e24c89-18d4-4e65-8c99-7572ca541d62", "metadata": {}, "source": [ "```{mermaid}\n", "flowchart LR\n", " A[Hard] -->|Text| B(Round)\n", " B --> C{Decision}\n", " C -->|Yes| D[Result 1]\n", " C -->|No| E[Result 2]\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 5 }