Shopper intent prediction from clickstream e-commerce data with minimal browsing information.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
12 10 2020
Historique:
received: 14 05 2020
accepted: 15 09 2020
entrez: 13 10 2020
pubmed: 14 10 2020
medline: 14 10 2020
Statut: epublish

Résumé

We address the problem of user intent prediction from clickstream data of an e-commerce website via two conceptually different approaches: a hand-crafted feature-based classification and a deep learning-based classification. In both approaches, we deliberately coarse-grain a new clickstream proprietary dataset to produce symbolic trajectories with minimal information. Then, we tackle the problem of trajectory classification of arbitrary length and ultimately, early prediction of limited-length trajectories, both for balanced and unbalanced datasets. Our analysis shows that k-gram statistics with visibility graph motifs produce fast and accurate classifications, highlighting that purchase prediction is reliable even for extremely short observation windows. In the deep learning case, we benchmarked previous state-of-the-art (SOTA) models on the new dataset, and improved classification accuracy over SOTA performances with our proposed LSTM architecture. We conclude with an in-depth error analysis and a careful evaluation of the pros and cons of the two approaches when applied to realistic industry use cases.

Identifiants

pubmed: 33046722
doi: 10.1038/s41598-020-73622-y
pii: 10.1038/s41598-020-73622-y
pmc: PMC7550603
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

16983

Références

Wu, Z., Tan, B. H., Duan, R., Liu, Y., & Mong Goh, R. S. Neural modeling of buying behaviour for e-commerce from clicking patterns. In Proceedings of the 2015 International ACM Recommender Systems Challenge. ACM (2015).
McMahan, H. B., Holt, G., Sculley, D., Young, M., Ebner, D., Grady, J., Nie, L., Phillips, T. et al. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2013).
Bertsimas, D., Mersereau, A. J., & Patel, N. R. Dynamic classification of online customers. In Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA (2003), pp. 107–118.
Toth, A., Tan, L., Di Fabbrizio, G. & Datta, A. Predicting shopping behavior with mixture of RNNs. In Proceedings of the SIGIR 2017 Workshop on eCommerce (ECOM 17) (2017).
Awalkar, A., Ahmed, I. & Nevrekar, T. Prediction of user’s purchases using clickstream data. Int. J. Eng. Sci. Comput. (2016).
Xing, Z., Pei, J. & Keogh, E. A brief survey on sequence classification. ACM SIGKDD Explor. Newsl. 12(1), 40–48 (2010).
doi: 10.1145/1882471.1882478
Bendtsen, J. D., Jensen, L. J., Blom, N., von Heijne, G. & Brunak, S. Feature-based prediction of non-classical and leaderless protein secretion. Prot. Eng. Des. Sel. 17(4), 349–356 (2004).
doi: 10.1093/protein/gzh037
Lotte, F., Congedo, M., Lecuyer, A., Lamarche, F. & Arnaldi, B. A review of classification algorithms for EEG-based brain-computer interfaces. J. Neural Eng. 4, 2 (2007).
doi: 10.1088/1741-2560/4/2/R01
Amed, I., Balchandani, A., Beltrami, M., Berg, A., Hedrich, S., & Rölkens, F. The state of fashion 2019: a year of awakening. Retrieved from https://www.mckinsey.com/industries/retail/our-insights/the-state-of-fashion-2019-ayear-of-awakening (2019).
Statista. E-commerce share of total retail revenue in the United States as of February 2019, by product category (2019). Accessed: 22nd April 2019.
Tagliabue, J., Yu, B. & Beaulieu, M. How to grow a (product) tree: personalized category suggestions for eCommerce type-ahead. In Proceedings of The 3rd Workshop on e-Commerce and NLP (Seattle, USA, 2020), pp. 7–18.
Iacovacci, J. & Lacasa, L. Sequential visibility graph motifs. Phys. Rev. E 93, 042309 (2016).
doi: 10.1103/PhysRevE.93.042309
Bronfenbrenner, U. Toward an experimental ecology of human development. American psychologist 32(7), 513 (1977).
doi: 10.1037/0003-066X.32.7.513
Andrade, C. Internal, external, and ecological validity in research design, conduct, and evaluation. Indian J. Psychol. Med. 40(5), 498–499 (2018).
doi: 10.4103/IJPSYM.IJPSYM_334_18
Thompson, N. C., Greenewald, K., Lee, K., & Manso, G. F. The Computational Limits of Deep Learning. arXiv preprint arXiv:2007.05558 (2020).
Mar, T., Zaunseder, S., Martinez, J. P., Llamedo, M. & Poll, R. Optimization of ECG classification by means of feature selection. IEEE Trans. Biomed. Eng. 58(8), 2168–2177 (2011).
doi: 10.1109/TBME.2011.2113395
Muñoz-Gil, G., Garcia-March, M. A., Manzo, C., Martín-Guerrero, J. D. & Lewenstein, M. Single trajectory characterization via machine learning. New J. Phys. 22(1), 013010 (2020).
doi: 10.1088/1367-2630/ab6065
Ogonowski, P. 15 ecommerce conversion rate statistics. Retrieved from: https://www.growcode.com/blog/ecommerce-conversion-rate (2019). Accessed 20 Apr 2020.
Voigt, P. & von dem Bussche, A. The EU General Data Protection Regulation (GDPR): A Practical Guide (Springer, Berlin, 2017).
doi: 10.1007/978-3-319-57959-7
Dong, G. & Jian, P. Sequence Data Mining (Springer, New York, 2007).
Lacasa, L., Luque, B., Ballesteros, F., Luque, J. & Nuño, J. C. From time series to complex networks: the visibility graph. Proc. Natl. Acad. Sci. USA 105(13), 4972–4975 (2008).
doi: 10.1073/pnas.0709247105
Luque, B., Lacasa, L., Ballesteros, F. & Luque, J. Horizontal visibility graphs: exact results for random time series. Phys. Rev. E 80, 4 (2009).
doi: 10.1103/PhysRevE.80.046103
Iacovacci, J. & Lacasa, L. Sequential motif profile of natural visibility graphs. Phys. Rev. E 94, 052309 (2016).
doi: 10.1103/PhysRevE.94.052309
Newman, M. The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003).
doi: 10.1137/S003614450342480
Gao, Z.-K., Small, M. & Kurths, J. Complex network analysis of time series. EPL 116, 5 (2017).
Zou, Y., Donner, R. V., Marwan, N., Donges, J. F. & Kurths, J. Complex network approaches to nonlinear time series analysis. Phys. Rep. 787, 1–97 (2019).
doi: 10.1016/j.physrep.2018.10.005
Iacovacci, J. & Lacasa, L. Visibility graphs for image processing. IEEE Trans. Pattern Anal. Mach. Intell. 42, 4 (2020).
doi: 10.1109/TPAMI.2019.2891742
Severini, S., Gutin, G. & Mansour, T. A characterization of horizontal visibility graphs and combinatorics on words. Physica A 390(12), 2421–2428 (2011).
doi: 10.1016/j.physa.2011.02.031
Luque, B. & Lacasa, L. Canonical horizontal visibility graphs are uniquely determined by their degree sequence. Eur. Phys. J. Spec. Top. 226, 383 (2017).
doi: 10.1140/epjst/e2016-60164-1
Lacasa, L. On the degree distribution of horizontal visibility graphs associated to Markov processes and dynamical systems: diagrammatic and variational approaches. Nonlinearity 27, 2063–2093 (2014).
doi: 10.1088/0951-7715/27/9/2063
Lacasa, L. & Just, W. Visibility graphs and symbolic dynamics. Physica D 374, 35–44 (2018).
doi: 10.1016/j.physd.2018.04.001
Lacasa, L., Luque, B., Luque, J. & Nuño, J. C. The Visibility Graph: a new method for estimating the Hurst exponent of fractional Brownian motion. EPL 86, 30001 (2009).
doi: 10.1209/0295-5075/86/30001
Luque, B., Lacasa, L., Ballesteros, F. & Robledo, A. Analytical properties of horizontal visibility graphs in the Feigenbaum scenario. Chaos 22(1), 013109 (2012).
doi: 10.1063/1.3676686
Núñez, A. M., Luque, B., Lacasa, L., Gómez, J. P. & Robledo, A. Horizontal visibility graphs generated by type-I intermittency. Phys. Rev. E 87, 052801 (2013).
doi: 10.1103/PhysRevE.87.052801
Luque, B., Núñez, A., Ballesteros, F. & Robledo, A. Quasiperiodic graphs: structural design, scaling and entropic properties. J. Nonlinear Sci. 23(2), 335–342 (2012).
doi: 10.1007/s00332-012-9153-2
Ahmadlou, M., Adeli, H. & Adeli, A. New diagnostic EEG markers of the Alzheimer's disease using visibility graph. J. Neural Transm. 117(9), 1099–109 (2010).
doi: 10.1007/s00702-010-0450-3
Sannino, S., Stramaglia, S., Lacasa, L. & Marinazzo, D. Visibility graphs for fMRI data: multiplex temporal graphs and their modulations across resting state networks. Netw. Neurosci. 1(3), 208–221 (2017).
doi: 10.1162/NETN_a_00012
Murugesana, M. & Sujitha, R. I. Combustion noise is scale-free: transition from scale-free to order at the onset of thermoacoustic instability. J. Fluid Mech. 772, 225–245 (2015).
doi: 10.1017/jfm.2015.215
Manshour, P., Rahimi Tabar, M. R. & Peinche, J. Fully developed turbulence in the view of horizontal visibility graphs. J. Stat. Mech. 2015(8), P08031 (2015).
Zou, Y., Donner, R. V., Marwan, N., Small, M. & Kurths, J. Long-term changes in the north–south asymmetry of solar activity: a nonlinear dynamics characterization using visibility graphs. Nonlinear Process. Geophys. 21, 1113–1126 (2014).
doi: 10.5194/npg-21-1113-2014
Chen, T. & Guestrin, C., Xgboost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM (2016).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (2017).
Lundberg, Scott M.. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 2522–5839 (2020).
doi: 10.1038/s42256-019-0138-9
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2(3), 18–22 (2002).
Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987).
doi: 10.1016/0169-7439(87)80084-9
McInnes, L., Healy, J. & Melville, J., Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).
Strubell, E., Ganesh, A. & McCallum, A. Energy and policy considerations for deep learning in NLP. In ACL (2019).

Auteurs

Borja Requena (B)

ICFO - Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology, Av. Carl Friedrich Gauss 3, 08860, Castelldefels, Barcelona, Spain.

Giovanni Cassani (G)

Department of Cognitive Science and Artificial Intelligence, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands.

Jacopo Tagliabue (J)

Coveo Labs, 44 Montgomery Street, San Francisco, CA, 94105, USA.

Ciro Greco (C)

Coveo Labs, 44 Montgomery Street, San Francisco, CA, 94105, USA.

Lucas Lacasa (L)

School of Mathematical Sciences, Queen Mary University of London, Mile End Road, London, E14NS, UK. l.lacasa@qmul.ac.uk.

Classifications MeSH