두 판다 기둥의 끈 연결

code

두 판다 기둥의 끈 연결

starcafe 2023. 7. 22. 10:18

두 판다 기둥의 끈 연결

나는 다음이 있습니다.DataFrame:

from pandas import *
df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3]})

다음과 같이 표시됩니다.

    bar foo
0    1   a
1    2   b
2    3   c

이제 저는 다음과 같은 것을 원합니다.

     bar
0    1 is a
1    2 is b
2    3 is c

어떻게 하면 이를 달성할 수 있을까요?다음을 시도했습니다.

df['foo'] = '%s is %s' % (df['bar'], df['foo'])

하지만 그것은 나에게 잘못된 결과를 줍니다.

>>>print df.ix[0]

bar                                                    a
foo    0    a
1    b
2    c
Name: bar is 0    1
1    2
2
Name: 0

바보 같은 질문이라 미안하지만, 판다 한 마리는 데이터 프레임에 두 개의 열을 결합하는 것은 저에게 도움이 되지 않았습니다.

df['bar'] = df.bar.map(str) + " is " + df.foo

이 질문은 이미 답변이 되었지만, 이전에 논의되지 않았던 유용한 방법들을 혼합에 던져두고, 지금까지 제안된 모든 방법들을 성능 면에서 비교해 보면 좋을 것 같습니다.

다음은 성능 순서에 따라 이 문제에 대한 몇 가지 유용한 해결책입니다.

`DataFrame.agg`

이는 단순한 접근 방식입니다.

df['baz'] = df.agg('{0[bar]} is {0[foo]}'.format, axis=1)
df
  foo  bar     baz
0   a    1  1 is a
1   b    2  2 is b
2   c    3  3 is c

여기서 f-string 형식을 사용할 수도 있습니다.

df['baz'] = df.agg(lambda x: f"{x['bar']} is {x['foo']}", axis=1)
df
  foo  bar     baz
0   a    1  1 is a
1   b    2  2 is b
2   c    3  3 is c

`char.array`기반 연결

연결할 열을 다음으로 변환합니다.chararrays그런 다음 그것들을 함께 더합니다.

a = np.char.array(df['bar'].values)
b = np.char.array(df['foo'].values)

df['baz'] = (a + b' is ' + b).astype(str)
df
  foo  bar     baz
0   a    1  1 is a
1   b    2  2 is b
2   c    3  3 is c

이해력 나열하기

나는 판다의 목록 이해가 얼마나 과소평가되었는지 과장할 수 없습니다.

df['baz'] = [str(x) + ' is ' + y for x, y in zip(df['bar'], df['foo'])]

또는 사용str.join콘케이트(확장성도 향상됨):

df['baz'] = [
    ' '.join([str(x), 'is', y]) for x, y in zip(df['bar'], df['foo'])]

df
  foo  bar     baz
0   a    1  1 is a
1   b    2  2 is b
2   c    3  3 is c

문자열 연산은 본질적으로 벡터화하기 어렵고 대부분의 판다 "벡터화된" 함수는 기본적으로 루프 주위의 래퍼이기 때문에 목록 이해는 문자열 조작에 탁월합니다.저는 이 주제에 대해 '팬더와의 루프를 위해'에서 광범위하게 글을 썼습니다. - 언제까지 신경을 써야 할까요?일반적으로 인덱스 정렬에 대해 걱정할 필요가 없는 경우 문자열 및 정규식 작업을 처리할 때 목록 이해를 사용합니다.

위의 목록 컴포트는 기본적으로 NaNs를 처리하지 않습니다.그러나 사용자가 처리해야 하는 경우를 제외하고는 언제든지 시도를 포장하여 함수를 작성할 수 있습니다.

def try_concat(x, y):
    try:
        return str(x) + ' is ' + y
    except (ValueError, TypeError):
        return np.nan


df['baz'] = [try_concat(x, y) for x, y in zip(df['bar'], df['foo'])]

`perfplot`성능 측정

성능 그림을 사용하여 생성된 그래프.전체 코드 목록입니다.

기능들

def brenbarn(df):
    return df.assign(baz=df.bar.map(str) + " is " + df.foo)

def danielvelkov(df):
    return df.assign(baz=df.apply(
        lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1))

def chrimuelle(df):
    return df.assign(
        baz=df['bar'].astype(str).str.cat(df['foo'].values, sep=' is '))

def vladimiryashin(df):
    return df.assign(baz=df.astype(str).apply(lambda x: ' is '.join(x), axis=1))

def erickfis(df):
    return df.assign(
        baz=df.apply(lambda x: f"{x['bar']} is {x['foo']}", axis=1))

def cs1_format(df):
    return df.assign(baz=df.agg('{0[bar]} is {0[foo]}'.format, axis=1))

def cs1_fstrings(df):
    return df.assign(baz=df.agg(lambda x: f"{x['bar']} is {x['foo']}", axis=1))

def cs2(df):
    a = np.char.array(df['bar'].values)
    b = np.char.array(df['foo'].values)

    return df.assign(baz=(a + b' is ' + b).astype(str))

def cs3(df):
    return df.assign(
        baz=[str(x) + ' is ' + y for x, y in zip(df['bar'], df['foo'])])

코드의 문제는 모든 행에 작업을 적용하려는 것입니다.당신이 작성한 방식은 전체 'bar'와 'foo' 열을 사용하여 문자열로 변환하고 하나의 큰 문자열을 반환합니다.다음과 같이 쓸 수 있습니다.

df.apply(lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1)

다른 답변보다 길지만 더 일반적입니다(문자열이 아닌 값과 함께 사용할 수 있음).

사용할 수도 있습니다.

df['bar'] = df['bar'].str.cat(df['foo'].values.astype(str), sep=' is ')

df.astype(str).apply(lambda x: ' is '.join(x), axis=1)

0    1 is a
1    2 is b
2    3 is c
dtype: object

series.str.cat이 문제에 가장 유연하게 접근할 수 있는 방법은 다음과 같습니다.

위해서df = pd.DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3]})

df.foo.str.cat(df.bar.astype(str), sep=' is ')

>>>  0    a is 1
     1    b is 2
     2    c is 3
     Name: foo, dtype: object

df.bar.astype(str).str.cat(df.foo, sep=' is ')

>>>  0    1 is a
     1    2 is b
     2    3 is c
     Name: bar, dtype: object

와는 달리.join()(단일 시리즈에 포함된 목록을 결합하기 위한 것), 이 방법은 2 시리즈를 함께 결합하기 위한 것입니다.또한 무시하거나 대체할 수 있습니다.NaN원하는 대로 값을 입력합니다.

@Daniel Velkov 답변이 적절한 답변이지만 문자열 리터럴을 사용하는 것이 더 빠릅니다.

# Daniel's
%timeit df.apply(lambda x:'%s is %s' % (x['bar'],x['foo']),axis=1)
## 963 µs ± 157 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# String literals - python 3
%timeit df.apply(lambda x: f"{x['bar']} is {x['foo']}", axis=1)
## 849 µs ± 4.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

10년이 지났지만 아무도 지난 10년 동안 제안된 모든 사례보다 50% 빠른 가장 간단하고 직관적인 방법을 제안하지 않았습니다.

df.bar.astype(str) + ' is ' + df.foo

임의의 수의 열에 대한 가장 간결한 해결책은 이 답변의 짧은 형식 버전이라고 생각합니다.

df.astype(str).apply(' is '.join, axis=1)

다음을 사용하여 두 개의 문자를 더 줄일 수 있습니다.df.agg()하지만 더 느립니다.

df.astype(str).agg(' is '.join, axis=1)

데이터 프레임에 10^11 행이 있는 특정 사례를 접했으며, 이 경우 제안된 솔루션은 적합하지 않습니다.카테고리를 사용해 본 적이 있으며, 고유 문자열 수가 너무 많지 않은 경우에는 모두 잘 작동합니다.이것은 요인이 있는 XxY가 있는 R 소프트웨어에서 쉽게 할 수 있지만 저는 파이썬(파이썬이 처음입니다)에서 다른 방법을 찾을 수 없었습니다.이것이 구현되는 장소를 아는 사람이 있다면 기꺼이 알고 싶습니다.

def Create_Interaction_var(df,Varnames):
    '''
    :df data frame
    :list of 2 column names, say "X" and "Y". 
    The two columns should be strings or categories
    convert strings columns to categories
    Add a column with the "interaction of X and Y" : X x Y, with name 
    "Interaction-X_Y"
    '''
    df.loc[:, Varnames[0]] = df.loc[:, Varnames[0]].astype("category")
    df.loc[:, Varnames[1]] = df.loc[:, Varnames[1]].astype("category")
    CatVar = "Interaction-" + "-".join(Varnames)
    Var0Levels = pd.DataFrame(enumerate(df.loc[:,Varnames[0]].cat.categories)).rename(columns={0 : "code0",1 : "name0"})
    Var1Levels = pd.DataFrame(enumerate(df.loc[:,Varnames[1]].cat.categories)).rename(columns={0 : "code1",1 : "name1"})
    NbLevels=len(Var0Levels)

    names = pd.DataFrame(list(itertools.product(dict(enumerate(df.loc[:,Varnames[0]].cat.categories)),
                                                dict(enumerate(df.loc[:,Varnames[1]].cat.categories)))),
                         columns=['code0', 'code1']).merge(Var0Levels,on="code0").merge(Var1Levels,on="code1")
    names=names.assign(Interaction=[str(x) + '_' + y for x, y in zip(names["name0"], names["name1"])])
    names["code01"]=names["code0"] + NbLevels*names["code1"]
    df.loc[:,CatVar]=df.loc[:,Varnames[0]].cat.codes+NbLevels*df.loc[:,Varnames[1]].cat.codes
    df.loc[:, CatVar]=  df[[CatVar]].replace(names.set_index("code01")[["Interaction"]].to_dict()['Interaction'])[CatVar]
    df.loc[:, CatVar] = df.loc[:, CatVar].astype("category")
    return df

from pandas import *
x = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3]})
x

 x['bar'] = x.bar.astype("str") + " " + "is" + " " + x.foo
x.drop(['foo'], axis=1)

언급URL : https://stackoverflow.com/questions/11858472/string-concatenation-of-two-pandas-columns

'code' 카테고리의 다른 글

스프링 데이터 탄력적 검색: 동일한 문서가 있는 다중 인덱스 (0)	2023.07.22
한 줄에 try/except를 어떻게 넣어야 합니까? (0)	2023.07.22
표 2에서 표 1 업데이트(새로 삽입, 업데이트만 변경, 누락 제거) (0)	2023.07.22
효율적인 양방향 해시 테이블을 구현하는 방법은 무엇입니까? (0)	2023.07.22
SpringBoot 파일 업로드 크기 제한이 Multipart 가져오기를 초과하는 경우최대 업로드 크기 대신 예외 발생초과됨예외. (0)	2023.07.22

현재글두 판다 기둥의 끈 연결

각종 프로그래밍 정보를 다루는 블로그입니다.

Excel, json, PYTHON, MariaDB, bash, spring-boot, mysql, mongodb, angularJS, SWIFT, C, sql-server, PowerShell, wordpress, Git, ajax, Oracle, jQuery, ReactJS, asp.net,

Today :
Yesterday :

starcafe

두 판다 기둥의 끈 연결

두 판다 기둥의 끈 연결

`DataFrame.agg`

`char.array`기반 연결

이해력 나열하기

`perfplot`성능 측정

'code' 카테고리의 다른 글

'code'의 다른글

티스토리툴바

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

두 판다 기둥의 끈 연결

두 판다 기둥의 끈 연결

DataFrame.agg

char.array기반 연결

이해력 나열하기

perfplot성능 측정

'code' 카테고리의 다른 글

'code'의 다른글

관련글

티스토리툴바

`DataFrame.agg`

`char.array`기반 연결

`perfplot`성능 측정