[Python] selenium 크롤러 제작 및 구글 스프레드 시트에 넣기

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

pbj0812의 코딩 일기

[Python] selenium 크롤러 제작 및 구글 스프레드 시트에 넣기 본문

ComputerLanguage_Program/PYTHON

[Python] selenium 크롤러 제작 및 구글 스프레드 시트에 넣기

pbj0812 2020. 1. 16. 01:47

0. 목표

- 회사 홈페이지의 하트표의 숫자값을 가져와서 구글 스프레드 시트에 넣음

- beautifulsoup만으로는 값을 뽑아낼 수 없어 selenium 적용

1. 필요 준비물

1) 라이브러리 설치

pip install beautifulsoup4
pip install selenium

2) 크롬 설치

- 링크

3) 크롬 드라이버 설치

- 링크

* 버전은 크롬의 버전에 맞게 다운로드 한다.

- 크롬 버전 확인

- 주소창에 chrome://version/

2. 코드 작성

1) 라이브러리 불러오기

from selenium import webdriver
from bs4 import BeautifulSoup

2) 크롬 드라이버 경로 설정

driver = webdriver.Chrome('/Users/pbj0812/Downloads/chromedriver')

3) 기다려줌

driver.implicitly_wait(3)

4) 홈페이지 접속

driver.get('https://www.wadiz.kr/web/equity/campaign/2899')

5) 데이터 크롤링

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')

6) 원하는 내용 찾기

- 결과 : [<span>211</span>]

- 태그 확인

aaa = soup.find('button', {'class':'EquityCampaignButton_likeButton__25vph'})
bbb = aaa.find_all('span', {'class':None})

7) 숫자 추출

- 정규식 사용

- bbb의 type이 bs4.element.ResultSet 으로 나오기에 str을 이용하여 문자로 바꾼뒤 적용

- 결과 : 211

import re
result = int(re.findall('\d+', str(bbb))[0])

8) 구글 스프레드 시트 연결 (참고)

import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds']
json_file_name = '/Users/pbj0812/Desktop/ouath/gspread-265016-de6a1ddc148e.json'
credentials = ServiceAccountCredentials.from_json_keyfile_name(json_file_name, scope)
gc = gspread.authorize(credentials)
spreadsheet_url = 'https://docs.google.com/spreadsheets/d/1FNsOyhG6i-1DLxCW6a0lY5vlNvGWep-5mhF9JWaQ2Vs/edit?usp=sharing'
# 스프레스시트 문서 가져오기 
doc = gc.open_by_url(spreadsheet_url)
worksheet = doc.worksheet('sheet1')

9) 구글 스프레드 시트에 결과값 적용

- 행 어펜드로 결과값이 위로부터 누적되게 함

worksheet.append_row([result])

3. 결과

4. 참고문헌

1) 정규식

2) 셀레니움 설치

3) beautifulsoup

저작자표시 비영리 동일조건

'ComputerLanguage_Program > PYTHON' 카테고리의 다른 글

[PYTHON] flask + pymysql로 mysql 정보 호출 및 웹 표출 (4)	2020.02.06
[Python] pymysql을 사용한 mysql 자료 추출 (0)	2020.02.04
[Python] python으로 구글 스프레드 시트에 작성하기 (0)	2020.01.14
[PYTHON] python으로 Elasticsearch 연동 (0)	2019.12.12
[PYTHON] Python에서 m-file 함수 사용하기(Oct2Py 소개) (3)	2019.11.13

'ComputerLanguage_Program/PYTHON' Related Articles

Comments

pbj0812의 코딩 일기

[Python] selenium 크롤러 제작 및 구글 스프레드 시트에 넣기 본문

[Python] selenium 크롤러 제작 및 구글 스프레드 시트에 넣기

0. 목표

1. 필요 준비물

2. 코드 작성

3. 결과

4. 참고문헌

'ComputerLanguage_Program > PYTHON' 카테고리의 다른 글

티스토리툴바