{
"cells": [
{
"cell_type": "markdown",
"id": "df5dc78a-4115-475d-b55b-67fd9eafbe84",
"metadata": {},
"source": [
"# 未解決の課題"
]
},
{
"cell_type": "markdown",
"id": "58c08d47-6997-407d-8e74-4fe8416b4b61",
"metadata": {},
"source": [
"本章では、Polars において実装が困難である、または未解決の課題をまとめています。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "df022bf5-5251-4c30-a0b0-ca5494b8a42e",
"metadata": {},
"outputs": [],
"source": [
"import polars as pl"
]
},
{
"cell_type": "markdown",
"id": "957601e2-c311-41dc-8a2f-24b131957f84",
"metadata": {},
"source": [
"## scatter\n",
"\n",
"`Series.scatter()`はSeriesオブジェクトをその場で修正\n",
"\n",
"https://github.com/pola-rs/polars/issues/17332"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6d977889-0d5a-4961-9a9a-7e79eeedfc2b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"shape: (5,)\n",
"Series: '' [i64]\n",
"[\n",
"\t99\n",
"\t2\n",
"\t3\n",
"\t4\n",
"\t5\n",
"]\n"
]
}
],
"source": [
"s = pl.Series([1, 2, 3, 4, 5])\n",
"s.scatter(0, 99)\n",
"print(s)"
]
},
{
"cell_type": "markdown",
"id": "9fbc67aa-ca3a-4a50-bdea-5979df4489b6",
"metadata": {},
"source": [
"`Expr.scatter()`が欲しいです。これがあれば、次の処理が簡単になります。\n",
"\n",
"https://github.com/pola-rs/polars/issues/13087"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "41266474-023e-4c45-9d20-e6ea4a33af4b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
shape: (5, 3)A | B | C |
---|
i64 | i64 | i64 |
1 | 0 | 1 |
2 | 5 | 100 |
3 | 9 | 210 |
4 | 2 | 4 |
5 | 10 | 320 |
"
],
"text/plain": [
"shape: (5, 3)\n",
"┌─────┬─────┬─────┐\n",
"│ A ┆ B ┆ C │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ i64 │\n",
"╞═════╪═════╪═════╡\n",
"│ 1 ┆ 0 ┆ 1 │\n",
"│ 2 ┆ 5 ┆ 100 │\n",
"│ 3 ┆ 9 ┆ 210 │\n",
"│ 4 ┆ 2 ┆ 4 │\n",
"│ 5 ┆ 10 ┆ 320 │\n",
"└─────┴─────┴─────┘"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pl.DataFrame(dict(\n",
" A=[1, 2, 3, 4, 5],\n",
" B=[0, 5, 9, 2, 10],\n",
"))\n",
"\n",
"def set_elements(cols):\n",
" a, b = cols\n",
" return a.scatter((a < b).arg_true(), [100, 210, 320])\n",
"\n",
"df2 = df.with_columns(\n",
" pl.map_batches(['A', 'B'], set_elements).alias('C')\n",
")\n",
"df2"
]
},
{
"cell_type": "markdown",
"id": "a0d953a2-3e43-46f1-815f-feebe725c00e",
"metadata": {},
"source": [
"次は`set_by_mask()`を`gather()`で実装します。"
]
},
{
"cell_type": "code",
"execution_count": 65,
"id": "4ed79352-d256-4a7c-a219-898a4eda63b2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 3)A | B | C |
---|
i64 | i64 | i64 |
1 | 0 | 1 |
2 | 5 | 100 |
3 | 9 | 200 |
4 | 2 | 4 |
5 | 10 | 300 |
"
],
"text/plain": [
"shape: (5, 3)\n",
"┌─────┬─────┬─────┐\n",
"│ A ┆ B ┆ C │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ i64 │\n",
"╞═════╪═════╪═════╡\n",
"│ 1 ┆ 0 ┆ 1 │\n",
"│ 2 ┆ 5 ┆ 100 │\n",
"│ 3 ┆ 9 ┆ 200 │\n",
"│ 4 ┆ 2 ┆ 4 │\n",
"│ 5 ┆ 10 ┆ 300 │\n",
"└─────┴─────┴─────┘"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"def set_by_mask(old_values, cond_expr, new_values):\n",
" if isinstance(new_values, (tuple, list)):\n",
" new_values = pl.lit(new_values).explode()\n",
" elif isinstance(new_values, (np.ndarray, pl.Series)):\n",
" new_values = pl.lit(new_values)\n",
" \n",
" return new_values.gather(pl.when(cond_expr).then(cond_expr.cum_sum()).otherwise(None) - 1).fill_null(old_values)\n",
"\n",
"df.with_columns(C=set_by_mask(pl.col('A'), pl.col('A') < pl.col('B'), np.array([100, 200, 300])))"
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "976c708e-df63-4982-850e-7916db697ec2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 2, 200, 4, 100, 6])"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"arr = np.array([1, 2, 3, 4, 5, 6])\n",
"index =[4, 2]\n",
"value = [100, 200]\n",
"arr[index] = value\n",
"arr"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6204b2a1-5566-4a7a-ae41-d00fcd81e874",
"metadata": {},
"outputs": [],
"source": [
"[None, None, 1, None, 0, None]"
]
},
{
"cell_type": "markdown",
"id": "42164b18-2763-4927-b0e0-1aa212fe67e2",
"metadata": {},
"source": [
"## rolling ignore NULL"
]
},
{
"cell_type": "markdown",
"id": "e27bf981-8b59-48ea-82a6-417d34f732e2",
"metadata": {},
"source": [
"`rolling_*()`はNULLに当たると、結果はNULLになります。"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "9d140d60-586f-4693-909a-4d1b5885bd35",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
"shape: (5, 1)\n",
"┌──────┐\n",
"│ A │\n",
"│ --- │\n",
"│ f64 │\n",
"╞══════╡\n",
"│ null │\n",
"│ null │\n",
"│ null │\n",
"│ 2.5 │\n",
"│ 1.5 │\n",
"└──────┘"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import polars as pl\n",
"\n",
"df = pl.DataFrame(\n",
" {\n",
" \"A\": [5, None, 3, 2, 1],\n",
" \"B\": [5, 3, None, 2, 1],\n",
" \"C\": [None, None, None, None, None],\n",
" }\n",
")\n",
"\n",
"df.select(pl.col('A').rolling_mean(2))"
]
},
{
"cell_type": "markdown",
"id": "8beddc44-970f-4c00-b104-0fdde2945afe",
"metadata": {},
"source": [
"次のコードはNULLではないデータに対して、`rolling_mean()`を計算し、元のNULLと結合します。"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d066152c-b8e5-45ff-aec7-17681322e542",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 6)A | B | C | A.mean | B.mean | C.mean |
---|
i64 | i64 | null | f64 | f64 | f64 |
5 | 5 | null | null | null | null |
null | 3 | null | null | 4.0 | null |
3 | null | null | 4.0 | null | null |
2 | 2 | null | 2.5 | 2.5 | null |
1 | 1 | null | 1.5 | 1.5 | null |
"
],
"text/plain": [
"shape: (5, 6)\n",
"┌──────┬──────┬──────┬────────┬────────┬────────┐\n",
"│ A ┆ B ┆ C ┆ A.mean ┆ B.mean ┆ C.mean │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ null ┆ f64 ┆ f64 ┆ f64 │\n",
"╞══════╪══════╪══════╪════════╪════════╪════════╡\n",
"│ 5 ┆ 5 ┆ null ┆ null ┆ null ┆ null │\n",
"│ null ┆ 3 ┆ null ┆ null ┆ 4.0 ┆ null │\n",
"│ 3 ┆ null ┆ null ┆ 4.0 ┆ null ┆ null │\n",
"│ 2 ┆ 2 ┆ null ┆ 2.5 ┆ 2.5 ┆ null │\n",
"│ 1 ┆ 1 ┆ null ┆ 1.5 ┆ 1.5 ┆ null │\n",
"└──────┴──────┴──────┴────────┴────────┴────────┘"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_res = df.with_columns(\n",
" pl.col(\"A\", \"B\", \"C\")\n",
" .rolling_mean(2)\n",
" .over(pl.col(\"A\", \"B\", \"C\").is_null())\n",
" .name.suffix('.mean')\n",
")\n",
"df_res"
]
},
{
"cell_type": "markdown",
"id": "18b7d1df-34a4-48d1-b0ed-1f0fd038b907",
"metadata": {},
"source": [
"次のコードは`rolling()`で、演算式を窓口に適用します。`.mean()`はNULL無視できます。この場合は`index`列が必要です。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "8121fa9c-c0cd-443d-8722-772d39953f63",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
"shape: (5, 1)\n",
"┌─────┐\n",
"│ A │\n",
"│ --- │\n",
"│ f64 │\n",
"╞═════╡\n",
"│ 5.0 │\n",
"│ 5.0 │\n",
"│ 3.0 │\n",
"│ 2.5 │\n",
"│ 1.5 │\n",
"└─────┘"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.with_row_index().select(\n",
" pl.col('A').mean().rolling('index', period='2i')\n",
")"
]
},
{
"cell_type": "markdown",
"id": "a6e24c89-18d4-4e65-8c99-7572ca541d62",
"metadata": {},
"source": [
"```{mermaid}\n",
"flowchart LR\n",
" A[Hard] -->|Text| B(Round)\n",
" B --> C{Decision}\n",
" C -->|Yes| D[Result 1]\n",
" C -->|No| E[Result 2]\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}