[인공지능(AI) 기초 다지기] 4. 기초튼튼, 수학튼튼 (3)

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

now is better than never

[인공지능(AI) 기초 다지기] 4. 기초튼튼, 수학튼튼 (3) 본문

Python/[코칭스터디 9기] 인공지능 AI 기초 다지기

[인공지능(AI) 기초 다지기] 4. 기초튼튼, 수학튼튼 (3)

김초송 2023. 2. 17. 18:20

4. Pandas II / 확률론 맛보기

3) Pandas 2

- Groupby

= SQL group by
split -> apply -> combine 과정을 거쳐서 연산

df.groupby("column1")["column2"].func()
df.groupby(["column1-1", "column1-2"])["column2"].func()

column1 : 묶음 컬럼
column2 : 연산 적용 컬럼
func : 집계 함수

- Hierarchical Index

groupby_df.unstack()

group으로 묶여진 데이터를 matrix 형태로 전환
두 개의 column으로 group by (index 두 개) -> untack해서 피쳐 생성
많이 쓰이는 Data Handling 방법

groupby_df.swaplevel()

index level 변경
index가 2개 이상일 때 index 순서를 바꿈

groupby_df.sum(level=0)

index level을 기준으로 연산 수행

- Groupby : grouped

groupby에 의해 split된 상태로 저장
tuple 형태로 그룹의 key, value 추출할 수 있음
추출된 group 정보에는 3가지 유형의 apply 가능
1. Aggregation: 요약 통계정보
2. Transformation: 정보 변환 -> 잘 안 씀
3. Filtration: 특정 정보 필터링

grouped = df.groupby("column")

for k, v in grouped:
    print(k) # key : 해당 컬럼의 데이터들 = .unique()
    print(v) # v : 해당 컬럼 기준 dataFrame
    
grouped.get_group("column_key") # 특정 key 값을 가진 그룹의 정보만 추출 가능

grouped.agg(sum) # = apply(sum)과 같은 결과
'''
agg({'column1':func,
     'column2':[func1, func2]}) 형태로 여러 개 컬럼에 여러 연산을 적용할 수 있음
'''

import numpy as np
grouped['column'].agg([np.sum, np.mean, np.std]) # 특정 컬럼에 여러 개 함수 apply

# trasform : 그룹별로 정규분포값
score = lambda x : (x - x.mean() / x.std())
grouped.transform(score)

# filter
df.groupby('column').filter(lambda x : boolean condition)

- Crosstab

두 컬럼의 교차 빈도, 비율, 덧셈 등을 구할 때 사용
pivot table의 특수한 형태
User-Item Rating Matrix 등을 만들 때 사용 가능
groupby > pivot table > crosstab 거의 동일하지만 더 특수한 형태

pd.crosstab(index=행, columns=컬럼, values=연산할 데이터, aggfunc=함수).fillna(0)
df.pivot_table([연산할 데이터], index=행, columns=컬럼, aggfunc=함수, fill_value=0)

- Merge

= SQL Merge, 두 개의 데이터를 하나로 합침
많은 feature를 key값을 기준으로 합칠 거나 word2vec으로 임베딩 할 때가 있음
-> 큰 벡터를 작은 벡터로 변형해서 딥러닝에 넣음
SQL에서는 시간이 많이 걸림
one-hot 벡터를 합칠 때 사용

pd.merge(df1, df2, on='key') # key를 기준으로 merge = inner join
pd.merge(df1, df2, left_on='key1', right_on='key2') # 두 df 의 key column 이름이 다를 때
# how : join 종류

on 지정 안하면 index를 기준으로 .merge

- DB Persistence

Persistence : 메모리에 올라간 데이터를 파일 형태로 만드는 것

Database connection
- Data loading 시 db connection 기능 제공
XLS persistence
- DataFrame의 엑셀 추출 코드로 파일로 저장
- xls 엔진으로 openpyxls 또는 XlsxWrite 사용
Pickle persistence
- 가장 일반적인 python 파일 persistence
- to_pickle, read_pickle 함수 사용
- pickle: 파이썬 파일 저장 형식

# 1
import sqlite3

conn = sqlite3.connect(".db") # db 연결
cur = conn.cursor()
cur.execute("select * from table")
result = cur.fetchall()
result

df = pd.read_sql_query("select * from table", conn) # db 연결 conn을 사용하여 df 생성

# 2
conda install openpyxl
conda install XlsxWriter

writer = pd.ExcelWriter('df_routes.xlsx', engine='xlsxwriter')
df_routes.to_excel(writer, sheet_name='Sheet1')

# 3
df_routes.to_pickle("df_routes.pickle")

df_routes_pickle = pd.read_pickle("df_routes.pickle")
df_routes_pickle.head()

+ python 에서 SQL 쿼리 실행

from pandasql import sqldf

pysqldf = lambda q : sqldf(q, globals())

q = """  select  *
        from table
    """

result = pysqldf(q)

'Python > [코칭스터디 9기] 인공지능 AI 기초 다지기' 카테고리의 다른 글

[인공지능(AI) 기초 다지기] 4. 기초튼튼, 수학튼튼 (4) (0)	2023.02.20
[인공지능(AI) 기초 다지기] 5. 딥러닝 핵심 기초 (1) (0)	2023.02.20
[인공지능(AI) 기초 다지기] 4. 기초튼튼, 수학튼튼 (2) (0)	2023.02.14
[인공지능(AI) 기초 다지기] 4. 기초튼튼, 수학튼튼 (1) (0)	2023.02.14
[인공지능(AI) 기초 다지기] 3. 기초 수학 첫걸음 (4) (0)	2023.02.09

'Python/[코칭스터디 9기] 인공지능 AI 기초 다지기' Related Articles

now is better than never

[인공지능(AI) 기초 다지기] 4. 기초튼튼, 수학튼튼 (3) 본문

[인공지능(AI) 기초 다지기] 4. 기초튼튼, 수학튼튼 (3)

4. Pandas II / 확률론 맛보기

3) Pandas 2

'Python > [코칭스터디 9기] 인공지능 AI 기초 다지기' 카테고리의 다른 글

티스토리툴바