未解決の課題

未解決の課題#

本章では、Polars において実装が困難である、または未解決の課題をまとめています。

import polars as pl

scatter#

Series.scatter()はSeriesオブジェクトをその場で修正

pola-rs/polars#17332

s = pl.Series([1, 2, 3, 4, 5])
s.scatter(0, 99)
print(s)

shape: (5,)
Series: '' [i64]
[
	99
	2
	3
	4
	5
]

Expr.scatter()が欲しいです。これがあれば、次の処理が簡単になります。

pola-rs/polars#13087

df = pl.DataFrame(dict(
    A=[1, 2, 3, 4, 5],
    B=[0, 5, 9, 2, 10],
))

def set_elements(cols):
    a, b = cols
    return a.scatter((a < b).arg_true(), [100, 210, 320])

df2 = df.with_columns(
    pl.map_batches(['A', 'B'], set_elements).alias('C')
)
df2

shape: (5, 3)

A	B	C
i64	i64	i64
1	0	1
2	5	100
3	9	210
4	2	4
5	10	320

次はset_by_mask()をgather()で実装します。

import numpy as np

def set_by_mask(old_values, cond_expr, new_values):
    if isinstance(new_values, (tuple, list)):
        new_values = pl.lit(new_values).explode()
    elif isinstance(new_values, (np.ndarray, pl.Series)):
        new_values = pl.lit(new_values)
        
    return new_values.gather(pl.when(cond_expr).then(cond_expr.cum_sum()).otherwise(None) - 1).fill_null(old_values)

df.with_columns(C=set_by_mask(pl.col('A'), pl.col('A') < pl.col('B'), np.array([100, 200, 300])))

shape: (5, 3)

A	B	C
i64	i64	i64
1	0	1
2	5	100
3	9	200
4	2	4
5	10	300

arr = np.array([1, 2, 3, 4, 5, 6])
index =[4, 2]
value = [100, 200]
arr[index] = value
arr

array([  1,   2, 200,   4, 100,   6])

[None, None, 1, None, 0, None]

rolling ignore NULL#

rolling_*()はNULLに当たると、結果はNULLになります。

import polars as pl

df = pl.DataFrame(
    {
        "A": [5, None, 3, 2, 1],
        "B": [5, 3, None, 2, 1],
        "C": [None, None, None, None, None],
    }
)

df.select(pl.col('A').rolling_mean(2))

shape: (5, 1)

A
f64
null
null
null
2.5
1.5

次のコードはNULLではないデータに対して、rolling_mean()を計算し、元のNULLと結合します。

df_res = df.with_columns(
    pl.col("A", "B", "C")
      .rolling_mean(2)
      .over(pl.col("A", "B", "C").is_null())
      .name.suffix('.mean')
)
df_res

shape: (5, 6)

A	B	C	A.mean	B.mean	C.mean
i64	i64	null	f64	f64	f64
5	5	null	null	null	null
null	3	null	null	4.0	null
3	null	null	4.0	null	null
2	2	null	2.5	2.5	null
1	1	null	1.5	1.5	null

次のコードはrolling()で、演算式を窓口に適用します。.mean()はNULL無視できます。この場合はindex列が必要です。

df.with_row_index().select(
    pl.col('A').mean().rolling('index', period='2i')
)

shape: (5, 1)

A
f64
5.0
5.0
3.0
2.5
1.5

        flowchart LR
    A[Hard] -->|Text| B(Round)
    B --> C{Decision}
    C -->|Yes| D[Result 1]
    C -->|No| E[Result 2]

未解決の課題

Contents

未解決の課題#

scatter#

rolling ignore NULL#