Implementasi Web Scraping untuk Pengambilan Data Pada Website E-Commerce

  • Apriza Zicka Rizquina Universitas Islam Indonesia
  • Chanifah Indah Ratnasari Universitas Islam Indonesia
Keywords: Web Scraping, E-commerce, Data Mining, HTML Parsing, CSS Selector

Abstract

Data is a new mind; data is gold; data is a new mine. This is a parable that was as familiar in the digital era as it is today. Data can be utilized, among other things, to improve operational efficiency, spur innovation in a business, understand user needs, and encourage decision-making. Data from e-commerce can be collected in a variety of ways, including web scraping techniques utilizing Python and libraries like Selenium, Beautifulsoup, and Time. The web scraping methods at Shopee and Tokopedia used in this study are HTML Parsing and CSS Selector. This study found that the HTML Parsing and CSS Selector methods cannot be used for web scraping on Shopee owing to bot and CAPTCHA detection mechanisms. However, this method was successfully used on Tokopedia. We conducted 10 web scraping attempts on two product pages, which resulted in around 160–166 data points each time, with 3–18 duplications of data. The average execution time of a program is 1 minute and 0.5 seconds.

Downloads

Download data is not yet available.

References

P. Thota and E. Ramez, “Web Scraping of COVID-19 News Stories to Create Datasets for Sentiment and Emotion Analysis,” ACM Int. Conf. Proceeding Ser., pp. 306–314, 2021, doi: 10.1145/3453892.3461333.

E. S. Sulistiyawati and A. Widayani, “Marketplace Shopee Sebagai Media Promosi Penjualan UMKM di Kota Blitar,” J. Pemasar. Kompetitif, vol. 4, no. 1, p. 133, 2020, doi: 10.32493/jpkpk.v4i1.7087.

A. Ahdiat, “Ini Pertumbuhan Pengunjung Tokopedia sampai Kuartal II 2022,” databoks, 2022. https://databoks.katadata.co.id/datapublish/2022/11/21/ini-pertumbuhan-pengunjung-tokopedia-sampai-kuartal-ii-2022

T. Rizaldi and H. Arief, “Perbandingan Metode Web Scraping Menggunakan CSS Selector dan Xpath Selector,” Teknika, vol. 6, no. 1, pp. 43–46, 2017, doi: 10.34148/teknika.v6i1.56.

M. Afdhal, V. Ariandi, and R. Rita, “Memprediksi Penjualan Pada Toko Hanifah Metode C.45,” J. Teknol. Dan Sist. Inf. Bisnis, vol. 4, no. 2, pp. 248–255, 2022, doi: 10.47233/jteksis.v4i1.460.

N. N. Hasanah and A. S. Purnomo, “Implementasi Data Mining Untuk Pengelompokan Buku Menggunakan Algoritma K-Means Clustering (Studi Kasus : Perpustakaan Politeknik LPP Yogyakarta),” J. Teknol. Dan Sist. Inf. Bisnis, vol. 4, no. 2, pp. 300–311, 2022, doi: 10.47233/jteksis.v4i2.499.

R. Gunawan, A. Rahmatulloh, I. Darmawan, and F. Firdaus, “Comparison of Web Scraping Techniques : Regular Expression, HTML DOM and Xpath,” vol. 2, no. IcoIESE 2018, pp. 283–287, 2019, doi: 10.2991/icoiese-18.2019.50.

I. Onyenwe, E. Onyedinma, C. Nwafor, and O. Agbata, “Developing Products Update-Alert System for E-Commerce Websites Users using Html Data and Web Scraping Technique,” Int. J. Nat. Lang. Comput., vol. 10, no. 5, pp. 01–07, 2021, doi: 10.5121/ijnlc.2021.10501.

E. Uzun, “A regular expression generator based on CSS selectors for efficient extraction from HTML pages,” Turkish J. Electr. Eng. Comput. Sci., vol. 28, no. 6, pp. 3389–3401, 2020, doi: 10.3906/ELK-2004-67.

M. El Asikri, S. Krit, and H. Chaib, “Using Web Scraping In A Knowledge Environment To Build Ontologies Using Python And Scrapy,” Eur. J. Transl. Clin. Med., vol. 07, no. 03, pp. 433–442, 2020, [Online]. Available: https://www.researchgate.net/publication/346215371

D. B. Pratama, A. Sofwan, and Y. A. A. Soetrisno, “Implementasi Teknik Web Scraping dan Fitur Data Eksternal pada Sistem Informasi Dosen Penelitian dan Pengabdian Dosen Fakultas Teknik Universitas Diponegoro,” vol. 10, no. 2, pp. 292–299, 2021.

L. C. Dewi, Meiliana, and A. Chandra, “Social media web scraping using social media developers API and regex,” Procedia Comput. Sci., vol. 157, pp. 444–449, 2019, doi: 10.1016/j.procs.2019.08.237.

S. Han and C. K. Anderson, “Web Scraping for Hospitality Research: Overview, Opportunities, and Implications,” Cornell Hosp. Q., vol. 62, no. 1, pp. 89–104, 2021, doi: 10.1177/1938965520973587.

M. Elveny, S. M. Hardi, I. Jaya, and P. Gundari, “Web-based E-Commerce Products Grouping,” J. Phys. Conf. Ser., vol. 1898, no. 1, 2021, doi: 10.1088/1742-6596/1898/1/012018.

A. Purnomo, “Impementasi Web Scraping Pada OJS Dengan Metode CSS Selector,” REDOLUSI Rekayasa Tek. Inform. dan Inf., vol. 3, no. 2, pp. 176–191, 2022.

S. Nyamathulla, P. Ratnababu, N. S. Shaik, and B. L. N, “A Review on Selenium Web Driver with Python,” Ann. Rom. Soc. Cell Biol., vol. 25, no. 4, pp. 16760–16768, 2021, [Online]. Available: http://annalsofrscb.ro

M. Levi, H. N. Palit, and S. Rostianingsih, “Perbandingan Performa Tools Web Scraping pada Website dengan Data Statis dan Dinamis,” 2020.

A. Nagpal and G. Gabrani, “Python for Data Analytics, Scientific and Technical Applications,” Proc. - 2019 Amity Int. Conf. Artif. Intell. AICAI 2019, pp. 140–145, 2019, doi: 10.1109/AICAI.2019.8701341.

Published
2023-10-03
How to Cite
Rizquina, A., & Ratnasari, C. (2023). Implementasi Web Scraping untuk Pengambilan Data Pada Website E-Commerce. Jurnal Teknologi Dan Sistem Informasi Bisnis, 5(4), 377-383. https://doi.org/10.47233/jteksis.v5i4.913
Section
Articles