Article Free Access
- Authors:
- Shubham Gupta https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India
https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India
http://orcid.org/0000-0003-4908-843X
Search about this author
- Nandini Saini https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India
https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India
http://orcid.org/0000-0003-1736-943X
Search about this author
- Suman Kundu https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India
https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India
http://orcid.org/0000-0002-7856-4768
Search about this author
- Debasis Das https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India
https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India
http://orcid.org/0000-0001-6205-4096
Search about this author
Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part IIMar 2024Pages 18–33https://doi.org/10.1007/978-3-031-56060-6_2
Published:24 March 2024Publication History
- 0citation
- 0
- Downloads
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part II
CrisisKAN: Knowledge-Infused andExplainable Multimodal Attention Network forCrisis Event Classification
Pages 18–33
PreviousChapterNextChapter
Abstract
Pervasive use of social media has become the emerging source for real-time information (like images, text, or both) to identify various events. Despite the rapid growth of image and text-based event classification, the state-of-the-art (SOTA) models find it challenging to bridge the semantic gap between features of image and text modalities due to inconsistent encoding. Also, the black-box nature of models fails to explain the model’s outcomes for building trust in high-stakes situations such as disasters, pandemic. Additionally, the word limit imposed on social media posts can potentially introduce bias towards specific events. To address these issues, we proposed CrisisKAN, a novel Knowledge-infused and Explainable Multimodal Attention Network that entails images and texts in conjunction with external knowledge from Wikipedia to classify crisis events. To enrich the context-specific understanding of textual information, we integrated Wikipedia knowledge using proposed wiki extraction algorithm. Along with this, a guided cross-attention module is implemented to fill the semantic gap in integrating visual and textual data. In order to ensure reliability, we employ a model-specific approach called Gradient-weighted Class Activation Mapping (Grad-CAM) that provides a robust explanation of the predictions of the proposed model. The comprehensive experiments conducted on the CrisisMMD dataset yield in-depth analysis across various crisis-specific tasks and settings. As a result, CrisisKAN outperforms existing SOTA methodologies and provides a novel view in the domain of explainable multimodal event classification. (Code repository: https://github.com/shubhamgpt007/CrisisKAN)
References
- 1.Abavisani, M., Wu, L., Hu, S., Tetreault, J., Jaimes, A.: Multimodal categorization of crisis events in social media. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14679–14689 (2020)Google Scholar
- 2.Adadi ABerrada MPeeking inside the black-box: a survey on explainable artificial intelligence (XAI)IEEE Access20186521385216010.1109/ACCESS.2018.2870052Google ScholarCross Ref
- 3.Agarwal MLeekha MSawhney RRatn Shah RKumar Yadav RKumar Vishwakarma Det al.Jose JMet al.MEMIS: multimodal emergency management information systemAdvances in Information Retrieval2020ChamSpringer47949410.1007/978-3-030-45439-5_32Google ScholarDigital Library
- 4.Alam, F., Ofli, F., Imran, M.: CrisisMMD: multimodal twitter datasets from natural disasters. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 12 (2018)Google Scholar
- 5.Anonymous: EA2n: Evidence-based AMR attention network for fake news detection. In: Submitted to The Twelfth International Conference on Learning Representations (2023). https://openreview.net/forum?id=5rrYpa2vts, under reviewGoogle Scholar
- 6.Arevalo, J., Solorio, T., Montes-y Gómez, M., González, F.A.: Gated multimodal units for information fusion (2017). arXiv preprint arXiv:1702.01992Google Scholar
- 7.Bandyopadhyay, D., Kumari, G., Ekbal, A., Pal, S., Chatterjee, A., BN, V.: A knowledge infusion based multitasking system for sarcasm detection in meme. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. LNCS, vol. 13980. Springer, Cham (2023). DOI: https://doi.org/10.1007/978-3-031-28244-7_7Google ScholarDigital Library
- 8.Chu, S.Y., Lee, M.S.: MT-DETR: robust end-to-end multimodal detection with confidence fusion. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 5241–5250 (2023). DOI: https://doi.org/10.1109/WACV56688.2023.00522Google ScholarCross Ref
- 9.Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. In: ICLR. OpenReview.net (2020)Google Scholar
- 10.Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: Attentive language models beyond a fixed-length context (2019). arXiv preprint arXiv:1901.02860Google Scholar
- 11.Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805Google Scholar
- 12.Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL Volume 1 (Long and Short Papers), pp. 4171–4186. ACL, Minneapolis, Minnesota (2019)Google Scholar
- 13.Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In: ICIKM, pp. 1625–1628. CIKM 2010, ACM, New York, NY, USA (2010)Google Scholar
- 14.f*ckui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding (2016). arXiv preprint arXiv:1606.01847Google Scholar
- 15.Gallo, I., Ria, G., Landro, N., La Grassa, R.: Image and text fusion for UPMC food-101 using BERT and CNNs. In: 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6. IEEE (2020)Google Scholar
- 16.Gupta SKundu SInteraction graph, topical communities, and efficient local event detection from social streamsExpert Syst. Appl.202323210.1016/j.eswa.2023.120890Google ScholarDigital Library
- 17.Gupta, S., Yadav, N., Sainath Reddy, S., Kundu, S.: FakEDAMR: Fake news detection using abstract meaning representation (2023)Google Scholar
- 18.Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1577–1586 (2020)Google Scholar
- 19.He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)Google Scholar
- 20.Holzinger AMalle BSaranti APfeifer BTowards multi-modal causability with graph neural networks enabling information fusion for explainable AIInf. Fusion202171283710.1016/j.inffus.2021.01.008Google ScholarCross Ref
- 21.Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017)Google Scholar
- 22.Huang, Z., Zeng, Z., Liu, B., Fu, D., Fu, J.: Pixel-BERT: Aligning image pixels with text by deep multi-modal transformers (2020). arXiv preprint arXiv:2004.00849Google Scholar
- 23.Hubenthal, M., Kumar, S.: Image-text pre-training for logo recognition. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1145–1154 (2023). DOI: https://doi.org/10.1109/WACV56688.2023.00120Google ScholarCross Ref
- 24.Hunt KWang BZhuang JMisinformation debunking and cross-platform information sharing through Twitter during Hurricanes Harvey and Irma: a case study on shelters and ID checksNat. Hazards2020103186188310.1007/s11069-020-04016-6Google ScholarCross Ref
- 25.Joshi GWalambe RKotecha KA review on explainability in multimodal deep neural netsIEEE Access20219598005982110.1109/ACCESS.2021.3070212Google ScholarCross Ref
- 26.Kiela, D., Bhooshan, S., Firooz, H., Perez, E., Testuggine, D.: Supervised multimodal bitransformers for classifying images and text (2019). arXiv preprint arXiv:1909.02950Google Scholar
- 27.Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32 (2018)Google Scholar
- 28.Kim, W., Son, B., Kim, I.: ViLT: vision-and-language transformer without convolution or region supervision. In: International Conference on Machine Learning, pp. 5583–5594. PMLR (2021)Google Scholar
- 29.Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks (2014). CoRR abs/1404.5997Google Scholar
- 30.Kwan, J.S.L., Lim, K.H.: Understanding public sentiments, opinions and topics about COVID-19 using twitter. In: 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 623–626. IEEE (2020)Google Scholar
- 31.Li, G., Duan, N., Fang, Y., Gong, M., Jiang, D.: Unicoder-VL: a universal encoder for vision and language by cross-modal pre-training. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 11336–11344 (2020)Google Scholar
- 32.Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: VisualBERT: A simple and performant baseline for vision and language (2019). arXiv preprint arXiv:1908.03557Google Scholar
- 33.Liang, T., Lin, G., Wan, M., Li, T., Ma, G., Lv, F.: Expanding large pre-trained unimodal models with multimodal information injection for image-text multimodal classification. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15471–15480 (2022). DOI: https://doi.org/10.1109/CVPR52688.2022.01505Google ScholarCross Ref
- 34.Long, S., Han, S.C., Wan, X., Poon, J.: GraDual: graph-based dual-modal representation for image-text matching. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2463–2472 (2022). DOI: https://doi.org/10.1109/WACV51458.2022.00252Google ScholarCross Ref
- 35.Mao, X., et al.: Towards robust vision transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12042–12051 (2022)Google Scholar
- 36.Moraliyage HSumanasena VDe Silva DNawaratne RSun LAlahakoon DMultimodal classification of onion services for proactive cyber threat intelligence using explainable deep learningIEEE Access202210560445605610.1109/ACCESS.2022.3176965Google ScholarCross Ref
- 37.Nazura, J., Muralidhara, B.L.: Semantic classification of tweets: a contextual knowledge based approach for tweet classification. In: 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), pp. 1–6 (2017). DOI: https://doi.org/10.1109/IISA.2017.8316358Google ScholarCross Ref
- 38.Petsiuk, V., et al.: Black-box explanation of object detectors via saliency maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11443–11452 (2021)Google Scholar
- 39.Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860 (2010)Google Scholar
- 40.Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017). DOI: https://doi.org/10.1109/ICCV.2017.74Google ScholarCross Ref
- 41.Shu, K., Zhou, X., Wang, S., Zafarani, R., Liu, H.: The role of user profiles for fake news detection. In: IEEE/ACM ASONAM, pp. 436–439. ASONAM 2019, Association for Computing Machinery, New York, NY, USA (2020)Google Scholar
- 42.Singh, A., et al.: FLAVA: a foundational language and vision alignment model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15638–15650 (2022)Google Scholar
- 43.Tahayna BAyyasamy RAkbar RContext-aware sentiment analysis using tweet expansion methodJ. ICT Res. Appl.202216213815110.5614/itbj.ict.res.appl.2022.16.2.3Google ScholarCross Ref
- 44.Vielzeuf, V., Lechervy, A., Pateux, S., Jurie, F.: CentralNet: a multilayer approach for multimodal fusion. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)Google Scholar
- 45.Wang, Y., Xu, X., Yu, W., Xu, R., Cao, Z., Shen, H.T.: Combine early and late fusion together: a hybrid fusion framework for image-text matching. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021)Google Scholar
- 46.Yang ZDai ZYang YCarbonell JSalakhutdinov RLe QVXLNet: Generalized Autoregressive Pretraining for Language Understanding2019Red Hook, NY, USACurran Associates Inc.Google Scholar
- 47.Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis (2017). arXiv preprint arXiv:1707.07250Google Scholar
Cited By
View all
Recommendations
- Multimodal Attentive Fusion Network for audio-visual event recognition
Abstract
Event classification is inherently sequential and multimodal. Therefore, deep neural models need to dynamically focus on the most relevant time window and/or modality of a video. In this study, we propose the Multimodal Attentive ...
Highlights
- State-of-the-art audio and visual interactions in neural networks are relatively simple.
Read More
- Crisis event summary generative model based on hierarchical multimodal fusion
Abstract
How to quickly obtain information about crisis events on social media such as Twitter and Weibo is crucial for follow-up rescue work and the promotion of postdisaster reconstruction. Therefore, it is very important to obtain useful information ...
Highlights
- This paper proposes a multimodal hierarchical fusion model for crisis event summarization, which maintains the independence of images.
- This paper proposes a dynamic selection strategy to better select suitable images.
Read More
- A Duo-generative Approach to Explainable Multimodal COVID-19 Misinformation Detection
WWW '22: Proceedings of the ACM Web Conference 2022
This paper focuses on a critical problem of explainable multimodal COVID-19 misinformation detection where the goal is to accurately detect misleading information in multimodal COVID-19 news articles and provide the reason or evidence that can explain ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
- Information
- Contributors
Published in
Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part II
Mar 2024
504 pages
ISBN:978-3-031-56059-0
DOI:10.1007/978-3-031-56060-6
- Editors:
- Nazli Goharian
https://ror.org/05vzafd60Georgetown University, Washington, WA, USA
, - Nicola Tonellotto
https://ror.org/03ad39j10University of Pisa, PISA, Pisa, Italy
, - Yulan He
https://ror.org/0220mzb33King's College London, London, UK
, - Aldo Lipani
https://ror.org/02jx3x895University College London, London, UK
, - Graham McDonald
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK
, - Craig Macdonald
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK
, - Iadh Ounis
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
Sponsors
In-Cooperation
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
- Published: 24 March 2024
Author Tags
- Multimodal Network
- Explainable
- Knowledge Infusion
- Crisis Detection
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations0
Article Metrics
- View Citations
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Table of Contents
Export Citations
Your Search Results Download Request
We are preparing your search results for download ...
We will inform you here when the file is ready.
Download now!
Your Search Results Download Request
Your file of search results citations is now ready.
Download now!
Your Search Results Download Request
Your search export query has expired. Please try again.