CrisisKAN: Knowledge-Infused and Explainable Multimodal Attention Network for Crisis Event Classification

Advanced Search
Browse
About
- Sign in
- Register

Advanced Search

Browse

Article

Free Access

Authors:
Shubham Gupta https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India

https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India

http://orcid.org/0000-0003-4908-843X
Search about this author

,
Nandini Saini https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India

https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India

http://orcid.org/0000-0003-1736-943X
Search about this author

,
Suman Kundu https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India

https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India

http://orcid.org/0000-0002-7856-4768
Search about this author

,
Debasis Das https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India

https://ror.org/03yacj906Department of Computer and Science Engineering, Indian Institute of Technology Jodhpur, Jodhpur, India

http://orcid.org/0000-0001-6205-4096
Search about this author

Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part IIMar 2024Pages 18–33https://doi.org/10.1007/978-3-031-56060-6_2

Published:24 March 2024Publication History

0citation
0
Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
Publisher Site

Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part II

CrisisKAN: Knowledge-Infused andExplainable Multimodal Attention Network forCrisis Event Classification

Pages 18–33

PreviousChapterNextChapter

Abstract

Pervasive use of social media has become the emerging source for real-time information (like images, text, or both) to identify various events. Despite the rapid growth of image and text-based event classification, the state-of-the-art (SOTA) models find it challenging to bridge the semantic gap between features of image and text modalities due to inconsistent encoding. Also, the black-box nature of models fails to explain the model’s outcomes for building trust in high-stakes situations such as disasters, pandemic. Additionally, the word limit imposed on social media posts can potentially introduce bias towards specific events. To address these issues, we proposed CrisisKAN, a novel Knowledge-infused and Explainable Multimodal Attention Network that entails images and texts in conjunction with external knowledge from Wikipedia to classify crisis events. To enrich the context-specific understanding of textual information, we integrated Wikipedia knowledge using proposed wiki extraction algorithm. Along with this, a guided cross-attention module is implemented to fill the semantic gap in integrating visual and textual data. In order to ensure reliability, we employ a model-specific approach called Gradient-weighted Class Activation Mapping (Grad-CAM) that provides a robust explanation of the predictions of the proposed model. The comprehensive experiments conducted on the CrisisMMD dataset yield in-depth analysis across various crisis-specific tasks and settings. As a result, CrisisKAN outperforms existing SOTA methodologies and provides a novel view in the domain of explainable multimodal event classification. (Code repository: https://github.com/shubhamgpt007/CrisisKAN)

References

1.Abavisani, M., Wu, L., Hu, S., Tetreault, J., Jaimes, A.: Multimodal categorization of crisis events in social media. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14679–14689 (2020)Google Scholar
2.Adadi ABerrada MPeeking inside the black-box: a survey on explainable artificial intelligence (XAI)IEEE Access20186521385216010.1109/ACCESS.2018.2870052Google ScholarCross Ref
3.Agarwal MLeekha MSawhney RRatn Shah RKumar Yadav RKumar Vishwakarma Det al.Jose JMet al.MEMIS: multimodal emergency management information systemAdvances in Information Retrieval2020ChamSpringer47949410.1007/978-3-030-45439-5_32Google ScholarDigital Library
4.Alam, F., Ofli, F., Imran, M.: CrisisMMD: multimodal twitter datasets from natural disasters. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 12 (2018)Google Scholar
5.Anonymous: EA2n: Evidence-based AMR attention network for fake news detection. In: Submitted to The Twelfth International Conference on Learning Representations (2023). https://openreview.net/forum?id=5rrYpa2vts, under reviewGoogle Scholar
6.Arevalo, J., Solorio, T., Montes-y Gómez, M., González, F.A.: Gated multimodal units for information fusion (2017). arXiv preprint arXiv:1702.01992Google Scholar
7.Bandyopadhyay, D., Kumari, G., Ekbal, A., Pal, S., Chatterjee, A., BN, V.: A knowledge infusion based multitasking system for sarcasm detection in meme. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. LNCS, vol. 13980. Springer, Cham (2023). DOI: https://doi.org/10.1007/978-3-031-28244-7_7Google ScholarDigital Library
8.Chu, S.Y., Lee, M.S.: MT-DETR: robust end-to-end multimodal detection with confidence fusion. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 5241–5250 (2023). DOI: https://doi.org/10.1109/WACV56688.2023.00522Google ScholarCross Ref
9.Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. In: ICLR. OpenReview.net (2020)Google Scholar
10.Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: Attentive language models beyond a fixed-length context (2019). arXiv preprint arXiv:1901.02860Google Scholar
11.Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805Google Scholar
12.Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL Volume 1 (Long and Short Papers), pp. 4171–4186. ACL, Minneapolis, Minnesota (2019)Google Scholar
13.Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In: ICIKM, pp. 1625–1628. CIKM 2010, ACM, New York, NY, USA (2010)Google Scholar
14.f*ckui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding (2016). arXiv preprint arXiv:1606.01847Google Scholar
15.Gallo, I., Ria, G., Landro, N., La Grassa, R.: Image and text fusion for UPMC food-101 using BERT and CNNs. In: 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6. IEEE (2020)Google Scholar
16.Gupta SKundu SInteraction graph, topical communities, and efficient local event detection from social streamsExpert Syst. Appl.202323210.1016/j.eswa.2023.120890Google ScholarDigital Library
17.Gupta, S., Yadav, N., Sainath Reddy, S., Kundu, S.: FakEDAMR: Fake news detection using abstract meaning representation (2023)Google Scholar
18.Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1577–1586 (2020)Google Scholar
19.He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)Google Scholar
20.Holzinger AMalle BSaranti APfeifer BTowards multi-modal causability with graph neural networks enabling information fusion for explainable AIInf. Fusion202171283710.1016/j.inffus.2021.01.008Google ScholarCross Ref
21.Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017)Google Scholar
22.Huang, Z., Zeng, Z., Liu, B., Fu, D., Fu, J.: Pixel-BERT: Aligning image pixels with text by deep multi-modal transformers (2020). arXiv preprint arXiv:2004.00849Google Scholar
23.Hubenthal, M., Kumar, S.: Image-text pre-training for logo recognition. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1145–1154 (2023). DOI: https://doi.org/10.1109/WACV56688.2023.00120Google ScholarCross Ref
24.Hunt KWang BZhuang JMisinformation debunking and cross-platform information sharing through Twitter during Hurricanes Harvey and Irma: a case study on shelters and ID checksNat. Hazards2020103186188310.1007/s11069-020-04016-6Google ScholarCross Ref
25.Joshi GWalambe RKotecha KA review on explainability in multimodal deep neural netsIEEE Access20219598005982110.1109/ACCESS.2021.3070212Google ScholarCross Ref
26.Kiela, D., Bhooshan, S., Firooz, H., Perez, E., Testuggine, D.: Supervised multimodal bitransformers for classifying images and text (2019). arXiv preprint arXiv:1909.02950Google Scholar
27.Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32 (2018)Google Scholar
28.Kim, W., Son, B., Kim, I.: ViLT: vision-and-language transformer without convolution or region supervision. In: International Conference on Machine Learning, pp. 5583–5594. PMLR (2021)Google Scholar
29.Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks (2014). CoRR abs/1404.5997Google Scholar
30.Kwan, J.S.L., Lim, K.H.: Understanding public sentiments, opinions and topics about COVID-19 using twitter. In: 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 623–626. IEEE (2020)Google Scholar
31.Li, G., Duan, N., Fang, Y., Gong, M., Jiang, D.: Unicoder-VL: a universal encoder for vision and language by cross-modal pre-training. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 11336–11344 (2020)Google Scholar
32.Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: VisualBERT: A simple and performant baseline for vision and language (2019). arXiv preprint arXiv:1908.03557Google Scholar
33.Liang, T., Lin, G., Wan, M., Li, T., Ma, G., Lv, F.: Expanding large pre-trained unimodal models with multimodal information injection for image-text multimodal classification. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15471–15480 (2022). DOI: https://doi.org/10.1109/CVPR52688.2022.01505Google ScholarCross Ref
34.Long, S., Han, S.C., Wan, X., Poon, J.: GraDual: graph-based dual-modal representation for image-text matching. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2463–2472 (2022). DOI: https://doi.org/10.1109/WACV51458.2022.00252Google ScholarCross Ref
35.Mao, X., et al.: Towards robust vision transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12042–12051 (2022)Google Scholar
36.Moraliyage HSumanasena VDe Silva DNawaratne RSun LAlahakoon DMultimodal classification of onion services for proactive cyber threat intelligence using explainable deep learningIEEE Access202210560445605610.1109/ACCESS.2022.3176965Google ScholarCross Ref
37.Nazura, J., Muralidhara, B.L.: Semantic classification of tweets: a contextual knowledge based approach for tweet classification. In: 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), pp. 1–6 (2017). DOI: https://doi.org/10.1109/IISA.2017.8316358Google ScholarCross Ref
38.Petsiuk, V., et al.: Black-box explanation of object detectors via saliency maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11443–11452 (2021)Google Scholar
39.Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860 (2010)Google Scholar
40.Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017). DOI: https://doi.org/10.1109/ICCV.2017.74Google ScholarCross Ref
41.Shu, K., Zhou, X., Wang, S., Zafarani, R., Liu, H.: The role of user profiles for fake news detection. In: IEEE/ACM ASONAM, pp. 436–439. ASONAM 2019, Association for Computing Machinery, New York, NY, USA (2020)Google Scholar
42.Singh, A., et al.: FLAVA: a foundational language and vision alignment model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15638–15650 (2022)Google Scholar
43.Tahayna BAyyasamy RAkbar RContext-aware sentiment analysis using tweet expansion methodJ. ICT Res. Appl.202216213815110.5614/itbj.ict.res.appl.2022.16.2.3Google ScholarCross Ref
44.Vielzeuf, V., Lechervy, A., Pateux, S., Jurie, F.: CentralNet: a multilayer approach for multimodal fusion. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)Google Scholar
45.Wang, Y., Xu, X., Yu, W., Xu, R., Cao, Z., Shen, H.T.: Combine early and late fusion together: a hybrid fusion framework for image-text matching. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021)Google Scholar
46.Yang ZDai ZYang YCarbonell JSalakhutdinov RLe QVXLNet: Generalized Autoregressive Pretraining for Language Understanding2019Red Hook, NY, USACurran Associates Inc.Google Scholar
47.Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis (2017). arXiv preprint arXiv:1707.07250Google Scholar

Cited By

View all

Recommendations

Multimodal Attentive Fusion Network for audio-visual event recognition
Abstract
Event classification is inherently sequential and multimodal. Therefore, deep neural models need to dynamically focus on the most relevant time window and/or modality of a video. In this study, we propose the Multimodal Attentive ...
Highlights
- State-of-the-art audio and visual interactions in neural networks are relatively simple.
Read More
Crisis event summary generative model based on hierarchical multimodal fusion
Abstract
How to quickly obtain information about crisis events on social media such as Twitter and Weibo is crucial for follow-up rescue work and the promotion of postdisaster reconstruction. Therefore, it is very important to obtain useful information ...
Highlights
- This paper proposes a multimodal hierarchical fusion model for crisis event summarization, which maintains the independence of images.
- This paper proposes a dynamic selection strategy to better select suitable images.
Read More
A Duo-generative Approach to Explainable Multimodal COVID-19 Misinformation Detection
WWW '22: Proceedings of the ACM Web Conference 2022
This paper focuses on a critical problem of explainable multimodal COVID-19 misinformation detection where the goal is to accurately detect misleading information in multimodal COVID-19 news articles and provide the reason or evidence that can explain ...
Read More

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Information
Contributors

Published in
Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part II
Mar 2024
504 pages
ISBN:978-3-031-56059-0
DOI:10.1007/978-3-031-56060-6
Editors:
Nazli Goharian
https://ror.org/05vzafd60Georgetown University, Washington, WA, USA
,
Nicola Tonellotto
https://ror.org/03ad39j10University of Pisa, PISA, Pisa, Italy
,
Yulan He
https://ror.org/0220mzb33King's College London, London, UK
,
Aldo Lipani
https://ror.org/02jx3x895University College London, London, UK
,
Graham McDonald
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK
,
Craig Macdonald
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK
,
Iadh Ounis
https://ror.org/00vtgdb53University of Glasgow, Glasgow, UK
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
Sponsors
In-Cooperation
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
- Published: 24 March 2024
Author Tags
Multimodal Network
Explainable
Knowledge Infusion
Crisis Detection
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Bibliometrics
Citations0

Article Metrics
- Total Citations
  View Citations
- Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

CrisisKAN: Knowledge-Infused and Explainable Multimodal Attention Network for Crisis Event Classification | Advances in Information Retrieval (2024)

New Citation Alert added!

New Citation Alert!

Advances in Information Retrieval: 46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part II

Abstract

References

Cited By

Recommendations

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Export Citations