MT-Former: Multi-Task Hybrid Transformer and Deep Support Vector Data Description to Detect Novel anomalies during Semiconductor Manufacturing

Hyunsu Jeong; Chiho Yoon; Hyunseok Lim; Jaesuk Chang; Sampa Misra; Chulhong Kim

doi:10.37188/lam.2025.032

Article Contents

Rights and permissions

Light: Advanced Manufacturing > Published> Article >Published online: 29 May 2025

Citation:

MT-Former: Multi-Task Hybrid Transformer and Deep Support Vector Data Description to Detect Novel anomalies during Semiconductor Manufacturing

Light: Advanced Manufacturing 6, Article number: (2025)

More Information

1.
Graduate School of Artificial Intelligence (GSAI), Department of Electrical Engineering, Convergence IT Engineering, Mechanical Engineering, Medical Science and Engineering, and Medical Device Innovation Center, Pohang University of Science and Technology (POSTECH), Pohang, South Korea
2.
Quality Intelligence System Team, SK Hynix, Icheon, 17336, South Korea

Corresponding author:
Chulhong Kim (chulhong@postech.edu)
These authors contributed equally: Hyunsu Jeong, Chiho Yoon
Received: 21 August 2024
Revised: 12 March 2025
Accepted: 25 March 2025
Accepted article preview online: 26 March 2025
Published online: 29 May 2025

doi: https://doi.org/10.37188/lam.2025.032

Abstract

Defect inspection is critical in semiconductor manufacturing for product quality improvement at reduced production costs. A whole new manufacturing process is often associated with a new set of defects that can cause serious damage to the manufacturing system. Therefore, classifying existing defects and new defects provides crucial clues to fix the issue in the newly introduced manufacturing process. We present a multi-task hybrid transformer (MT-former) that distinguishes novel defects from the known defects in electron microscope images of semiconductors. MT-former consists of upstream and downstream training stages. In the upstream stage, an encoder of a hybrid transformer is trained by solving both classification and reconstruction tasks for the existing defects. In the downstream stage, the shared encoder is fine-tuned by simultaneously learning the classification as well as a deep support vector domain description (Deep-SVDD) to detect the new defects among the existing ones. With focal loss, we also design a hybrid-transformer using convolutional and an efficient self-attention module. Our model is evaluated on real-world data from SK Hynix and on publicly available data from magnetic tile defects and HAM10000. For SK Hynix data, MT-former achieved higher AUC as compared with a Deep-SVDD model, by 8.19% for anomaly detection and by 9.59% for classifying the existing classes. Furthermore, the best AUC (magnetic tile defect 67.9%, HAM10000 70.73%) on the public dataset achieved with the proposed model implies that MT-former would be a useful model for classifying the new types of defects from the existing ones.
- Semiconductor defect inspection,
- Deep-SVDD,
- Multi-task learning,
- Anomaly detection,
- Hybrid-transformer

References

[1]	Gómez-Sirvent, J. L. et al. Defect classification on semiconductor wafers using fisher vector and visual vocabularies coding. Measurement 202, 111872 (2022).
[2]	Harada, M., Minekawa, Y. & Nakamae, K. Defect detection techniques robust to process variation in semiconductor inspection. Measurement Science and Technology 30, 035402 (2019).
[3]	Bhonsle, R. et al. Inspection, characterization and classification of defects for improved CMP of III-V materials. ECS Journal of Solid State Science and Technology 4, P5073-P5077 (2015).
[4]	Zipfel, J. et al. Anomaly detection for industrial quality assurance: a comparative evaluation of unsupervised deep learning models. Computers & Industrial Engineering 177, 109045 (2023).
[5]	Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 15 (2009).
[6]	Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21-27 (1967).
[7]	Carratù, M. et al. A novel methodology for unsupervised anomaly detection in industrial electrical systems. IEEE Transactions on Instrumentation and Measurement 72, 3532812 (2023).
[8]	Krizhevsky, A. , Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA: Curran Associates Inc. , 2012, 1097-1105.
[9]	He, K. M. et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016, 770-778.
[10]	Yang, J. G. et al. Recent advances in deep-learning-enhanced photoacoustic imaging. Advanced Photonics Nexus 2, 054001 (2023).
[11]	Park, J. et al. Clinical translation of photoacoustic imaging. Nature Reviews Bioengineering (2024) http://dx. doi.org/10.1038/s44222-024-00240-y.
[12]	Misra, S. et al. Deep learning‐based multimodal fusion network for segmentation and classification of breast cancers using B‐mode and elastography ultrasound images. Bioengineering & Translational Medicine 8, e10480 (2023).
[13]	Yoon, C. et al. Collaborative multi-modal deep learning and radiomic features for classification of strokes within 6h. Expert Systems with Applications 228, 120473 (2023).
[14]	Jeong, H. et al. Robust ensemble of two different multimodal approaches to segment 3D ischemic stroke segmentation using brain tumor representation among multiple center datasets. Journal of Imaging Informatics in Medicine 37, 2375-2389 (2024).
[15]	Park, E. et al. Unsupervised inter-domain transformation for virtually stained high-resolution mid-infrared photoacoustic microscopy using explainable deep learning. Nature Communications 15, 10892 (2024).
[16]	Kim, S. et al. Convolutional neural network–based metal and streak artifacts reduction in dental CT images with sparse‐view sampling scheme. Medical Physics 49, 6253-6277 (2022).
[17]	Misra, S. et al. Bi-modal transfer learning for classifying breast cancers via combined B-mode and ultrasound strain imaging. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 69, 222-232 (2022).
[18]	Choi, S. et al. Deep learning enhances multiparametric dynamic volumetric photoacoustic computed tomography in vivo (DL‐PACT). Advanced Science 10, 2202089 (2023).
[19]	Wang, M., Zhou, D. H. & Chen, M. Y. Hybrid variable monitoring mixture model for anomaly detection in industrial processes. IEEE Transactions on Cybernetics 54, 319-331 (2024).
[20]	Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504-507 (2006).
[21]	Bergmann, P. et al. Improving unsupervised defect segmentation by applying structural similarity to autoencoders. Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Funchal, Portugal: VISIGRAPP, 2019, 372-380.
[22]	Sakurada, M. & Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. Gold Coast, Australia: ACM, 2014, 4-11.
[23]	Masci, J. et al. Stacked convolutional auto-encoders for hierarchical feature extraction. Proceedings of the 21st International Conference on Artificial Neural Networks and Machine Learning. Espoo, Finland: Springer, 2011, 52-59.
[24]	Zhang, H. B. et al. Unsupervised deep anomaly detection for medical images using an improved adversarial autoencoder. Journal of Digital Imaging 35, 153-161 (2022).
[25]	Zhang, C. K. , Wang, Y. M. & Tan, W. M. MTHM: self-supervised multitask anomaly detection with hard example mining. IEEE Transactions on Instrumentation and Measurement 72, 3518613 (2023).
[26]	Luo, J. X. et al. SMD anomaly detection: a self-supervised texture–structure anomaly detection framework. IEEE Transactions on Instrumentation and Measurement 71, 5017611 (2022).
[27]	Cheng, X. et al. Deep self-representation learning framework for hyperspectral anomaly detection. IEEE Transactions on Instrumentation and Measurement 73, 5002016 (2024).
[28]	Goodfellow, I. et al. Generative adversarial nets. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014, 2672-2680.
[29]	Kim, J. et al. Deep learning alignment of bidirectional raster scanning in high speed photoacoustic microscopy. Scientific Reports 12, 16238 (2022).
[30]	Kim, G. et al. Integrated deep learning framework for accelerated optical coherence tomography angiography. Scientific Reports 12, 1289 (2022).
[31]	Kim, J. et al. Deep learning acceleration of multiscale superresolution localization photoacoustic imaging. Light: Science & Applications 11, 131 (2022).
[32]	Niu, M. H. et al. An adaptive pyramid graph and variation residual-based anomaly detection network for rail surface defects. IEEE Transactions on Instrumentation and Measurement 70, 5020013 (2021).
[33]	Schlegl, T. et al. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. Proceedings of the 25th International Conference on Information Processing in Medical Imaging. Boone, NC, USA: Springer, 2017, 146-157.
[34]	Lee, S. et al. Emergency triage of brain computed tomography via anomaly detection with a deep generative model. Nature Communications 13, 4251 (2022).
[35]	Akcay, S. , Atapour-Abarghouei, A. & Breckon, T. P. GANomaly: semi-supervised anomaly detection via adversarial training. Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer, 2019, 622-637.
[36]	Ruff, L. et al. Deep one-class classification. Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018, 4393-4402.
[37]	Misra, S. et al. A voting-based ensemble feature network for semiconductor wafer defect classification. Scientific Reports 12, 16254 (2022).
[38]	Imoto, K. et al. A CNN-based transfer learning method for defect classification in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing 32, 455-459 (2019). doi: 10.1109/TSM.2019.2941752
[39]	Chen, Z. Q. et al. DMVSVDD: multi-view data novelty detection with deep autoencoding support vector data description. Expert Systems with Applications 240, 122443 (2024).
[40]	Dong, X. H., Taylor, C. J. & Cootes, T. F. Defect classification and detection using a multitask deep one-class CNN. IEEE Transactions on Automation Science and Engineering 19, 1719-1730 (2022). doi: 10.1109/TASE.2021.3109353
[41]	Liu, B. et al. Adaboost-based SVDD for anomaly detection with dictionary learning. Expert Systems with Applications 238, 121770 (2024).
[42]	Zhou, Y. et al. VAE-based deep SVDD for anomaly detection. Neurocomputing 453, 131-140 (2021).
[43]	Yi, J. H. & Yoon, S. Patch SVDD: patch-level SVDD for anomaly detection and segmentation. Proceedings of the 15th Asian Conference on Computer Vision. Kyoto, Japan: Springer, 2020, 375-390.
[44]	Roth, K. et al. Towards total recall in industrial anomaly detection. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022, 14298-14308.
[45]	Yu, J. B. & Liu, J. T. Two-dimensional principal component analysis-based convolutional autoencoder for wafer map defect detection. IEEE Transactions on Industrial Electronics 68, 8789-8797 (2021).
[46]	Kang, H. & Kang, S. A stacking ensemble classifier with handcrafted and convolutional features for wafer map pattern classification. Computers in Industry 129, 103450 (2021).
[47]	Cheon, S. et al. Convolutional neural network for wafer surface defect classification and the detection of unknown defect class. IEEE Transactions on Semiconductor Manufacturing 32, 163-170 (2019).
[48]	Wen, G. J. et al. A novel method based on deep convolutional neural networks for wafer semiconductor surface defect inspection. IEEE Transactions on Instrumentation and Measurement 69, 9668-9680 (2020).
[49]	Kim, E. S. et al. An oversampling method for wafer map defect pattern classification considering small and imbalanced data. Computers & Industrial Engineering 162, 107767 (2021).
[50]	Tao, X. et al. Deep learning for unsupervised anomaly localization in industrial images: A survey. IEEE Transactions on Instrumentation and Measurement 71, 5018021 (2022).
[51]	Gao, Y. P. et al. A multilevel information fusion-based deep learning method for vision-based defect recognition. IEEE Transactions on Instrumentation and Measurement 69, 3980-3991 (2020).
[52]	Yang, L. M., Zhou, F. Q. & Wang, L. A scratch detection method based on deep learning and image segmentation. IEEE Transactions on Instrumentation and Measurement 71, 5015012 (2022).
[53]	Tao, X. et al. ViTALnet: anomaly on industrial textured surfaces with hybrid transformer. IEEE Transactions on Instrumentation and Measurement 72, 5009013 (2023).
[54]	Shang, H. B. et al. Defect-aware transformer network for intelligent visual surface defect detection. Advanced Engineering Informatics 55, 101882 (2023).
[55]	Vaswani, A. et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates Inc. , 2017, 6000-6010.
[56]	Gao, Y. H. , Zhou, M. & Metaxas, D. N. UTNet: a hybrid transformer architecture for medical image segmentation. Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France: Springer, 2021, 61-71.
[57]	Wang, S. N. et al. Linformer: self-attention with linear complexity. Print at https://arxiv.org/abs/2006.04768 (2020).
[58]	Bello, I. et al. Attention augmented convolutional networks. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019, 3285-3294.
[59]	Shaw, P. , Uszkoreit, J. & Vaswani, A. Self-attention with relative position representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). New Orleans, LA, USA: ACL, 2018, 464-468.
[60]	Caruana, R. Multitask learning. Machine Learning 28, 41-75 (1997).
[61]	Lin, T. Y. et al. Focal loss for dense object detection. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017, 2999-3007.
[62]	Dureuil, V. et al. Wafer bevel shape inducing high defect density in shallow trench isolation process. Proceedings of 2010 IEEE/SEMI Advanced Semiconductor Manufacturing Conference (ASMC). San Francisco, CA, USA: IEEE, 2010, 213-216.
[63]	Huang, Y. B., Qiu, C. Y. & Yuan, K. Surface defect saliency of magnetic tile. The Visual Computer 36, 85-96 (2020). doi: 10.1007/s00371-018-1588-5
[64]	Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data 5, 180161 (2018). doi: 10.1038/sdata.2018.161
[65]	Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 2579-2605 (2008).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article′s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article′s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(6) / Tables(7)

Get Citation

PDF

XML

Research Summary

MT-Former: Multi-task Hybrid Transformer for Semiconductor Anomaly Detection

The introduction of a new manufacturing process often brings a related set of defects that can significantly impact the system. Distinguishing between existing and new defects is key to resolving issues in the newly introduced manufacturing process. We propose multi-task hybrid transformer (MT-Former) that can distinguish between the existing and the new defects in electron microscope images of semiconductors. MT-Former consists of a two-stage training approach. In the upstream phase, a hybrid transformer encoder is jointly trained on classification and reconstruction tasks using existing defect data. In the downstream phase, the shared encoder is fine-tuned using a deep support vector data description (Deep-SVDD) approach to identify novel anomalies. The experimental results demonstrated that MT-former performed adequately in the classification of new defects arising from the introduced semiconductor processes.

show all

Article Metrics

Article views(94) PDF downloads(560) Citation(0) Citation counts are provided from Web of Science. The counts may vary by service, and are reliant on the availability of their data.

HTML

Experiments

Comparison models

We compared the accuracy of MT-former with six state-of-the-art deep anomaly detection models, i.e. DAE²⁰, GANomaly³⁵, AnoGAN³³, Patch-SVDD⁴³, Patchcore⁴⁴ and Deep-SVDD³⁶.

1. DAE²⁰ uses reconstruction error as a criterion for judging anomaly scores.

2. AnoGAN³³ is an early anomaly detection model based on GAN, which calculates anomaly scores by considering latent space in image space.

3. GANomaly³⁵ is a form in which an encoder is added to AnoGAN, and is more intuitive than AnoGAN to learn image and latent space at once.

4. Patch-SVDD⁴³ is an approach that can utilize local information by embedding in patch units. For comparison, Patch-SVDD's patch size is set to 32, which is half the input size.

5. Patchcore⁴⁴ extracts patch-wise features based on a model trained on ImageNet data to detect anomalies.

6. Deep-SVDD³⁶ obtains a hypersphere surrounding normal data, then uses it to identify abnormalities.

Evaluation metrics

To evaluate anomaly-detection accuracy, we defined abnormal cases as ‘positive’ and used four evaluation metrics: true positive rate (TPR, recall), false positive rate (FPR), signal-to-background ratio (S/B), and area under the receiver operating characteristic curve (AUC). Receiver operating characteristic (ROC) curves are used to visualize the tradeoff between TPR and an FPR at different thresholds, while AUC shows the overall detection accuracy as the area under the ROC curve. In classification experiments, we adopted weighted average AUC because from the viewpoints of quality inspection and costs, frequent defects are the most important. The metrics are defined as:

$$ TPR=Recall=\frac{TP}{TP+FN} $$ (11)

$$ FPR=\frac{FP}{FP+TN} $$ (12)

$$ S/B=\frac{TP}{FP} $$ (13)

$$ Weighted\;Average=\sum _{i=1}^{c}{w}_{i}\times {s}_{i} $$ (14)

where w_i is the number of data belonging to class i and s_iis the score of class i.

Data Acquisition

In this study, all experiments were conducted on 6078 datasets, including 24 new defect datasets from a domestic SK Hynix’s FAB process (SK-defect, SK Hynix, South Korea). SK data were collected in different settings and system environments. The defect images were provided in 64 to 80 pixel sizes, a size that allows engineers to visually identify defects and enables rapid analysis. The images typically have a field of view (FOV) ranging from 1 µm to 2 µm, which corresponds to approximately 15.6 nm to 31.3 nm per pixel for 64 × 64 pixel images. For each manufacturing process, defect types were defined considering image shape and process characteristics (Fig. 1). For instance, the ALIGN_ERR defect tends to exhibit alignment errors where the target is far from the center, the BOL defect often shows a round or circular shape, FLKE defect looks like flakes, which are similar to thin chip-like fragments⁶², and the FOCUS_ERR is characterized by blurry or out-of-focus patterns in the imaging. The collected data were divided into 2951 training and 3127 testing (Table 2a). The 18 defect classes include three new types of defects and 15 existing types of defects. This study considers the existing defects as ‘normal’ cases for the Deep-SVDD task. The data is not publicly available. However, the authors will make the data available upon reasonable request and with the permission of SK Hynix.

(a) SK-defect
Data subclass	Train	Test		Total
Data subclass	Train	Normal	Abnormal	Total
ALIGN_ERR	178	90	−	268
ARCI	76	3	−	79
BOL	347	56	−	403
BOL2	278	1522	−	1800
CMSC	199	27	−	226
CRAC	257	75	−	332
DDGG	451	743	−	1194
DESF	222	296	−	518
FLKE	153	8	−	161
FOCUS_ERR	147	26	−	173
L_FLKE	26	2	−	28
LFVO	106	9	−	115
NOIS	150	60	−	210
RESI	119	12	−	131
SBPT	242	174	−	416
GOTGAM	−	−	9	9
L_SFPT	−	−	2	2
SFPT	−	−	13	13
Total	2951	3103	24
(b) Magnetic tile defect
Data subclass	Train	Test		Total
Data subclass	Train	Normal	Abnormal	Total
Blowhole	92	23	−	115
Break	68	17	−	85
Crack	45	12	−	57
Uneven	82	21	−	103
Fray	−	−	32	32
Total	287	73	32
(c) HAM10000
Data subclass	Train	Test		Total
0	228	66		294
1	359	103		462
2	769	220		989
3	80	23		103
4	779	223		1002
5	4693	1341		6034
6	99	29		128
Total	7007	2005

Table 2. Class ditribution of train and test in defect image dataset.

For robustly the capability of our method, we evaluated our models on two public dataset. The magnetic tile defect dataset was previously utilized for another validation⁶³. This dataset contains one non-defect case and five defect cases: blowhole, crack, fray, break, and uneven. To evaluate the ability to distinguish new defects, we performed validation by excluding the non-defect case and using only five defect classes, and set the ‘fray’ class that had the fewest instances as the ‘new’ defect. For normal defect data, we split the data 8:2 for training and testing, and all new defects were used for testing. In the final dataset, the number of training data was 287 and the number of testing data was 105, including 32 abnormal cases (Table 2b). The well-known HAM10000⁶⁴ consists of 6 subclasses, with an imbalanced dataset. The HAM10000 dataset is divided into 7007 training, 1003 validation, and 2005 testing (Table 2c). In our experiments with this dataset, we treated each subclass as a ‘new’ defect in turn, while considering the remaining subclasses as normal defects, which allowed us to observe the performance differences across classes.

Implementation Details

The experiments were conducted using the PyTorch framework and executed on an NVIDIA Tesla T4 GPU with 16 GB of RAM. The initial learning rate was set to 0.001, and the Adam optimizer updated the model parameters with a weight decay of 0.0005. All images in datasets were resized to 64 × 64. Additionally, random horizontal-flip or vertical-flip data augmentations were applied. All models were trained for 100 epochs for upstream tasks and 800 epochs for downstream tasks, using a batch size of 128.

The same settings were also used for the external validation of magnetic tile defect and HAM10000 dataset. For the HAM10000 with small input size was scaled to 32 × 32 to account for the minimum input size of 32 for typical anomaly models for effective learning and sufficient feature representation Ref. 33. The proposed and comparison models were trained for 100 epochs with batch size 2. To deal with the small size of input data, all the comparative models parameterized to the same network architecture with our proposed model, including latent vector size 64 and model depth 3.

Conclusion

Conventional anomaly detection only distinguishes between non-defective and defective images, the models are just required to identify regular patterns of normal images. Deep autoencoders (DAEs), a classic anomaly detection model, perform poorly when applied to multi-normal classes, because it is difficult to accurately reconstruct different normal shapes. Deep Support Vector Domain Description (Deep-SVDD) is known to classify anomaly cases given single-normal case. Since the normal cases have similar shapes and patterns in conventional task, the latent vectors are clustered properly even if they are trained as one class. However, it is observed that the vectors are not properly clustered if the existing abnormal cases are heterogeneous in shape and pattern, as in our task. We analyzed that training diverse patterns of existing defects as a single class led to a broad data distribution on the existing defects, causing new defects to be included into the broad distribution. To solve this problem, we used multi-task learning (MTL) to simultaneously learn to distinguish the existing abnormal classes, avoiding the broad distribution and leading to the well-clustered latent vectors. In addition, since our model is only trained on the existing defects, the model does not require retraining when a new defect is identified. Considering intricate patterns of the current defect classes, we introduced MT-former that uses MTL and an ESAM to detect unknown defects that existing inspection systems cannot find in scanning electron microscope (SEM) images. MTL can simultaneously classify the existing defect kinds with various forms, so it can cluster existing classes efficiently for anomaly identification. MTL also greatly stabilizes training with respect to false positive rate (FPR) and false negative rate (FNR). ESAM takes global contextual features to consider irregular patterns from complex fabrication systems, and maximizes the efficiency of focal loss to effectively analyze imbalanced data.

Compared with SOTA models, our method shows better TPR result for especially the region < 20% FPR region with high AUC (Fig. 3), representing that our model provides more balanced performance at various thresholds. Table 3 demonstrates that while GANomaly records low FPR at an optimal threshold, the TPR remains below 50% (Table 3). This indicates that missing even a few defects could lead to significant economic losses in the semiconductor industry, making the model unsuitable for practical applications. Hence, our model proves to be more effective than other methods for this field. These improvements are attributed to the integration of the ESAM and convolutional module as a block, enabling effective extraction of both local and global features. As a result, our model successfully detects defects across both localized areas and broader regions, enhancing overall reliability. Finally, the pretrained model for anomaly detection demonstrates that the model can also serve as a weight-initialization technique to classify the existing-defect classes.

For future works, we discuss potential challenges. First of all, a small increase in FPR indicates a large number of normal cases are misclassified as abnormal because of the imbalanced data. Therefore, further improvements in reducing FPR are essential to achieve reliable anomaly detection. Second, the sub-sampling approach of the ESAM should be further studied. In its current implementation, the sub-sampling reduces resolution at a fixed ratio across all layers. However, applying the same subsampling rate to smaller feature maps of last layers can cause substantial loss of abstract information. Therefore, it is needed to study finding the optimal sub-sampling approach with less information loss. Last, the model's explainability should be covered more. While visualization techniques like t-SNE provide valuable insights into the model's behavior, incorporating methods such as Class Activation Mapping (CAM) could highlight the regions the model focuses on during classification or anomaly detection tasks. This advancement would improve interpretability and offer deeper insights into the model’s decision-making process, fostering greater transparency and trust in its application.

Acknowledgements

This work was supported by SK Hynix AICC (P23.03); by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT (2023R1A2C3004880) and the Ministry of Education (2020R1A6A1A03047902 and 2022R1A6A1A03052954); by Basic Science Research Program through the NRF funded by the Ministry of Education (RS-2024-00415450); by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.RS-2019-II191906, Artificial Intelligence Graduate School Program (POSTECH)); by the BK21 FOUR project; by Glocal University 30 projects.

Reference (65)

[1]	Gómez-Sirvent, J. L. et al. Defect classification on semiconductor wafers using fisher vector and visual vocabularies coding. Measurement 202, 111872 (2022).
[2]	Harada, M., Minekawa, Y. & Nakamae, K. Defect detection techniques robust to process variation in semiconductor inspection. Measurement Science and Technology 30, 035402 (2019).
[3]	Bhonsle, R. et al. Inspection, characterization and classification of defects for improved CMP of III-V materials. ECS Journal of Solid State Science and Technology 4, P5073-P5077 (2015).
[4]	Zipfel, J. et al. Anomaly detection for industrial quality assurance: a comparative evaluation of unsupervised deep learning models. Computers & Industrial Engineering 177, 109045 (2023).
[5]	Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: A survey. ACM Computing Surveys (CSUR) 41, 15 (2009).
[6]	Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21-27 (1967).
[7]	Carratù, M. et al. A novel methodology for unsupervised anomaly detection in industrial electrical systems. IEEE Transactions on Instrumentation and Measurement 72, 3532812 (2023).
[8]	Krizhevsky, A. , Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA: Curran Associates Inc. , 2012, 1097-1105.
[9]	He, K. M. et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016, 770-778.
[10]	Yang, J. G. et al. Recent advances in deep-learning-enhanced photoacoustic imaging. Advanced Photonics Nexus 2, 054001 (2023).
[11]	Park, J. et al. Clinical translation of photoacoustic imaging. Nature Reviews Bioengineering (2024) http://dx. doi.org/10.1038/s44222-024-00240-y.
[12]	Misra, S. et al. Deep learning‐based multimodal fusion network for segmentation and classification of breast cancers using B‐mode and elastography ultrasound images. Bioengineering & Translational Medicine 8, e10480 (2023).
[13]	Yoon, C. et al. Collaborative multi-modal deep learning and radiomic features for classification of strokes within 6h. Expert Systems with Applications 228, 120473 (2023).
[14]	Jeong, H. et al. Robust ensemble of two different multimodal approaches to segment 3D ischemic stroke segmentation using brain tumor representation among multiple center datasets. Journal of Imaging Informatics in Medicine 37, 2375-2389 (2024).
[15]	Park, E. et al. Unsupervised inter-domain transformation for virtually stained high-resolution mid-infrared photoacoustic microscopy using explainable deep learning. Nature Communications 15, 10892 (2024).
[16]	Kim, S. et al. Convolutional neural network–based metal and streak artifacts reduction in dental CT images with sparse‐view sampling scheme. Medical Physics 49, 6253-6277 (2022).
[17]	Misra, S. et al. Bi-modal transfer learning for classifying breast cancers via combined B-mode and ultrasound strain imaging. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 69, 222-232 (2022).
[18]	Choi, S. et al. Deep learning enhances multiparametric dynamic volumetric photoacoustic computed tomography in vivo (DL‐PACT). Advanced Science 10, 2202089 (2023).
[19]	Wang, M., Zhou, D. H. & Chen, M. Y. Hybrid variable monitoring mixture model for anomaly detection in industrial processes. IEEE Transactions on Cybernetics 54, 319-331 (2024).
[20]	Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504-507 (2006).
[21]	Bergmann, P. et al. Improving unsupervised defect segmentation by applying structural similarity to autoencoders. Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Funchal, Portugal: VISIGRAPP, 2019, 372-380.
[22]	Sakurada, M. & Yairi, T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. Gold Coast, Australia: ACM, 2014, 4-11.
[23]	Masci, J. et al. Stacked convolutional auto-encoders for hierarchical feature extraction. Proceedings of the 21st International Conference on Artificial Neural Networks and Machine Learning. Espoo, Finland: Springer, 2011, 52-59.
[24]	Zhang, H. B. et al. Unsupervised deep anomaly detection for medical images using an improved adversarial autoencoder. Journal of Digital Imaging 35, 153-161 (2022).
[25]	Zhang, C. K. , Wang, Y. M. & Tan, W. M. MTHM: self-supervised multitask anomaly detection with hard example mining. IEEE Transactions on Instrumentation and Measurement 72, 3518613 (2023).
[26]	Luo, J. X. et al. SMD anomaly detection: a self-supervised texture–structure anomaly detection framework. IEEE Transactions on Instrumentation and Measurement 71, 5017611 (2022).
[27]	Cheng, X. et al. Deep self-representation learning framework for hyperspectral anomaly detection. IEEE Transactions on Instrumentation and Measurement 73, 5002016 (2024).
[28]	Goodfellow, I. et al. Generative adversarial nets. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014, 2672-2680.
[29]	Kim, J. et al. Deep learning alignment of bidirectional raster scanning in high speed photoacoustic microscopy. Scientific Reports 12, 16238 (2022).
[30]	Kim, G. et al. Integrated deep learning framework for accelerated optical coherence tomography angiography. Scientific Reports 12, 1289 (2022).
[31]	Kim, J. et al. Deep learning acceleration of multiscale superresolution localization photoacoustic imaging. Light: Science & Applications 11, 131 (2022).
[32]	Niu, M. H. et al. An adaptive pyramid graph and variation residual-based anomaly detection network for rail surface defects. IEEE Transactions on Instrumentation and Measurement 70, 5020013 (2021).
[33]	Schlegl, T. et al. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. Proceedings of the 25th International Conference on Information Processing in Medical Imaging. Boone, NC, USA: Springer, 2017, 146-157.
[34]	Lee, S. et al. Emergency triage of brain computed tomography via anomaly detection with a deep generative model. Nature Communications 13, 4251 (2022).
[35]	Akcay, S. , Atapour-Abarghouei, A. & Breckon, T. P. GANomaly: semi-supervised anomaly detection via adversarial training. Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer, 2019, 622-637.
[36]	Ruff, L. et al. Deep one-class classification. Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018, 4393-4402.
[37]	Misra, S. et al. A voting-based ensemble feature network for semiconductor wafer defect classification. Scientific Reports 12, 16254 (2022).
[38]	Imoto, K. et al. A CNN-based transfer learning method for defect classification in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing 32, 455-459 (2019).
[39]	Chen, Z. Q. et al. DMVSVDD: multi-view data novelty detection with deep autoencoding support vector data description. Expert Systems with Applications 240, 122443 (2024).
[40]	Dong, X. H., Taylor, C. J. & Cootes, T. F. Defect classification and detection using a multitask deep one-class CNN. IEEE Transactions on Automation Science and Engineering 19, 1719-1730 (2022).
[41]	Liu, B. et al. Adaboost-based SVDD for anomaly detection with dictionary learning. Expert Systems with Applications 238, 121770 (2024).
[42]	Zhou, Y. et al. VAE-based deep SVDD for anomaly detection. Neurocomputing 453, 131-140 (2021).
[43]	Yi, J. H. & Yoon, S. Patch SVDD: patch-level SVDD for anomaly detection and segmentation. Proceedings of the 15th Asian Conference on Computer Vision. Kyoto, Japan: Springer, 2020, 375-390.
[44]	Roth, K. et al. Towards total recall in industrial anomaly detection. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022, 14298-14308.
[45]	Yu, J. B. & Liu, J. T. Two-dimensional principal component analysis-based convolutional autoencoder for wafer map defect detection. IEEE Transactions on Industrial Electronics 68, 8789-8797 (2021).
[46]	Kang, H. & Kang, S. A stacking ensemble classifier with handcrafted and convolutional features for wafer map pattern classification. Computers in Industry 129, 103450 (2021).
[47]	Cheon, S. et al. Convolutional neural network for wafer surface defect classification and the detection of unknown defect class. IEEE Transactions on Semiconductor Manufacturing 32, 163-170 (2019).
[48]	Wen, G. J. et al. A novel method based on deep convolutional neural networks for wafer semiconductor surface defect inspection. IEEE Transactions on Instrumentation and Measurement 69, 9668-9680 (2020).
[49]	Kim, E. S. et al. An oversampling method for wafer map defect pattern classification considering small and imbalanced data. Computers & Industrial Engineering 162, 107767 (2021).
[50]	Tao, X. et al. Deep learning for unsupervised anomaly localization in industrial images: A survey. IEEE Transactions on Instrumentation and Measurement 71, 5018021 (2022).
[51]	Gao, Y. P. et al. A multilevel information fusion-based deep learning method for vision-based defect recognition. IEEE Transactions on Instrumentation and Measurement 69, 3980-3991 (2020).
[52]	Yang, L. M., Zhou, F. Q. & Wang, L. A scratch detection method based on deep learning and image segmentation. IEEE Transactions on Instrumentation and Measurement 71, 5015012 (2022).
[53]	Tao, X. et al. ViTALnet: anomaly on industrial textured surfaces with hybrid transformer. IEEE Transactions on Instrumentation and Measurement 72, 5009013 (2023).
[54]	Shang, H. B. et al. Defect-aware transformer network for intelligent visual surface defect detection. Advanced Engineering Informatics 55, 101882 (2023).
[55]	Vaswani, A. et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates Inc. , 2017, 6000-6010.
[56]	Gao, Y. H. , Zhou, M. & Metaxas, D. N. UTNet: a hybrid transformer architecture for medical image segmentation. Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention. Strasbourg, France: Springer, 2021, 61-71.
[57]	Wang, S. N. et al. Linformer: self-attention with linear complexity. Print at https://arxiv.org/abs/2006.04768 (2020).
[58]	Bello, I. et al. Attention augmented convolutional networks. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019, 3285-3294.
[59]	Shaw, P. , Uszkoreit, J. & Vaswani, A. Self-attention with relative position representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). New Orleans, LA, USA: ACL, 2018, 464-468.
[60]	Caruana, R. Multitask learning. Machine Learning 28, 41-75 (1997).
[61]	Lin, T. Y. et al. Focal loss for dense object detection. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017, 2999-3007.
[62]	Dureuil, V. et al. Wafer bevel shape inducing high defect density in shallow trench isolation process. Proceedings of 2010 IEEE/SEMI Advanced Semiconductor Manufacturing Conference (ASMC). San Francisco, CA, USA: IEEE, 2010, 213-216.
[63]	Huang, Y. B., Qiu, C. Y. & Yuan, K. Surface defect saliency of magnetic tile. The Visual Computer 36, 85-96 (2020).
[64]	Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data 5, 180161 (2018).
[65]	Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 2579-2605 (2008).

	Layer	Number of filters/heads	Filter size	Activation function
Attention encoder	Convolution (s = 2)	16	3*3	ReLU
	ESAM	3	−	−
	Convolution (s = 2)	32	3*3	ReLU
	ESAM	3	−	−
	Convolution (s = 2)	64	3*3	ReLU
	ESAM	3	−	−
Feature extraction	AdaptiveAvgPool2D	−	−	−
Feature extraction	Flatten	64	−	−
Classification	Fully Connected Layer	15	−	Softmax
Decoder	Conv2DTranspose (s = 2)	32	3*3	Relu
	Conv2DTranspose (s = 2)	16	3*3	Relu
	Conv2DTranspose (s = 2)	1	3*3	Tanh

Model		SK-defect			Param #	Training Time (sec)	Inference Time (msec)
Model	TPR (↑)	FPR (↓)	AUC (↑)	S/B (↑)	Param #	Training Time (sec)	Inference Time (msec)
DAE¹⁸	0	0	0.3220	Inf	46K	20	5
AnoGAN³¹	0.5417	0.1369	0.6912	3.06%	167K	10	83
GANomaly³³	0.2916	0.0728	0.5897	3.10%	910K	7	5
PatchSVDD⁴¹	0.5417	0.2684	0.6354	0.45%	106K	80	10
Patchcore⁴²	0.7083	0.4402	0.6020	1.24%	68M	60	192
Deep-SVDD³⁴	0	0	0.7229	Inf	23K	12	19
MT-former (Proposed)	0.7083	0.2491	0.7821	2.20%	66K	46	24
Notes. Training time shows the time required to train one epoch. Inference time indicates inference time per one image.

Model	Magnetic tile defect	HAM10000
Model	Magnetic tile defect	0	1	2	3	4	5	6	Mean
DAE¹⁸	0.3258	0.7220	0.6828	0.6214	0.4388	0.6376	0.3366	0.5807	0.5743
AnoGAN³¹	0.5360	0.5008	0.6279	0.4919	0.5769	0.4827	0.5426	0.5980	0.5444
GANomaly³³	0.6644	0.6981	0.6505	0.6027	0.5582	0.5493	0.4042	0.4512	0.5592
PatchSVDD⁴¹	0.6357	0.4853	0.4842	0.5000	0.3908	0.6420	0.4707	0.4858	0.4941
Patchcore⁴²	0.3840	0.4457	0.4004	0.3935	0.3758	0.4958	0.1646	0.3783	0.3791
Deep-SVDD³⁴	0.3540	0.5103	0.5000	0.5444	0.5071	0.5073	0.4757	0.4502	0.4993
MT-former (Proposed)	0.6798	0.7480	0.8193	0.6767	0.8020	0.6807	0.4324	0.7918	0.7073

	MTL-DAE	MTL-SVDD	Focal	ESAM	TPR (↑)	FPR (↓)	AUC (↑)
(a)					0	0	0.7229
(b)	√				0.1667	0.0342	0.7751
(c)		√			0.4583	0.3032	0.5489
(d)	√	√			0.4583	0.1795	0.7268
(e)	√	√	√		0.4583	0.2056	0.6658
(f)	√	√		√	0.3750	0.1579	0.6527
(g)	√	√	√	√	0.7083	0.2491	0.7821

Number of heads	TPR (↑)	FPR (↓)	AUC (↑)
1	0.4583	0.2555	0.6497
2	0.2917	0.1985	0.4696
3	0.7083	0.2491	0.7821
4	0.2917	0.2346	0.4993
5	0.4167	0.2775	0.5039

MT-Former: Multi-Task Hybrid Transformer and Deep Support Vector Data Description to Detect Novel anomalies during Semiconductor Manufacturing