- Browse
- » Parametric time-frequency domain spatial audio
Parametric time-frequency domain spatial audio
Publisher
John Wiley & Sons, Inc
Publication Date
[2018]
Language
English
Description
Loading Description...
More Details
Contributors
ISBN
9781119252597
111925261
9781119252610
111925261
9781119252610
Table of Contents
From the eBook
List of Contributors xiii
Preface xv
About the Companion Website xix
Part I Analysis and Synthesis of Spatial Sound 1
TimeFrequency Processing: Methods and Tools 3 /Juha Vilkamo and Tom Backstrom
1.1 Introduction 3
1.2 TimeFrequency Processing 4
1.2.1 Basic Structure 4
1.2.2 Uniform Filter Banks 5
1.2.3 Prototype Filters and Modulation 6
1.2.4 A Robust Complex-Modulated Filter Bank, and Comparison with STFT 8
1.2.5 Overlap-Add and Windowing 12
1.2.6 Example Implementation of a Robust Filter Bank in Matlab 13
1.2.7 Cascaded Filters 15
1.3 Processing of Spatial Audio 16
1.3.1 Stochastic Estimates 17
1.3.2 Decorrelation 18
1.3.3 Optimal and Generalized Solution for Spatial Sound Processing Using Covariance Matrices 19
References 23
2 Spatial Decomposition by Spherical Array Processing 25 /David Lou Alon and Boaz Rafaely
2.1 Introduction 25
2.2 Sound Field Measurement by a Spherical Array 26
2.3 Array Processing and Plane-Wave Decomposition 26
2.4 Sensitivity to Noise and Standard Regularization Methods 29
2.5 Optimal Noise-Robust Design 32
2.5.1 PWD Estimation Error Measure 32
2.5.2 PWD Error Minimization 34
2.5.3 R-PWD Simulation Study 35
2.6 Spatial Aliasing and High Frequency Performance Limit 37
2.7 High Frequency Bandwidth Extension by Aliasing Cancellation 39
2.7.1 Spatial Aliasing Error 39
2.7.2 AC-PWD Simulation Study 40
2.8 High Performance Broadband PWD Example 42
2.8.1 Broadband Measurement Model 42
2.8.2 Minimizing Broadband PWD Error 42
2.8.3 BB-PWD Simulation Study 44
2.9 Summary 45
2.10 Acknowledgment 46
References 46
3 Sound Field Analysis Using Sparse Recovery 49 /Craig T. Jin, Nicolas Epain, and Tahereh Noohi
3.1 Introduction 49
3.2 The Plane-Wave Decomposition Problem 50
3.2.1 Sparse Plane-Wave Decomposition 51
3.2.2 The Iteratively Reweighted Least-Squares Algorithm 51
3.3 Bayesian Approach to Plane-Wave Decomposition 53.
3.4 Calculating the IRLS Noise-Power Regularization Parameter 55
3.4.1 Estimation of the Relative Noise Power 56
3.5 Numerical Simulations 58
3.6 Experiment: Echoic Sound Scene Analysis 59
3.7 Conclusions 65
Appendix 65
References 66
Part II Reproduction of Spatial Sound 69
Overview of TimeFrequency Domain Parametric Spatial Audio Techniques 71 /Archontis Politis, Symeon Delikaris-Manias, and Ville Pulkki
4.1 Introduction 71
4.2 Parametric Processing Overview 73
4.2.1 Analysis Principles 74
4.2.2 Synthesis Principles 75
4.2.3 Spatial Audio Coding and Up-Mixing 76
4.2.4 Spatial Sound Recording and Reproduction 78
4.2.5 Auralization of Measured Room Acoustics and Spatial Rendering of Room Impulse Responses 81
References 82
5 First-Order Directional Audio Coding (DirAC) 89 /Ville Pulkki, Archontis Politis, Mikko-Ville Laitinen, Juha Vilkamo, and Jukka Ahonen
5.1 Representing Spatial Sound with First-Order B-Format Signals 89
5.2 Some Notes on the Evolution of the Technique 92
5.3 DirAC with Ideal B-Format Signals 94
5.4 Analysis of Directional Parameters with Real Microphone Setups 97
5.4.1 DOA Analysis with Open 2D Microphone Arrays 97
5.4.2 DOA Analysis with 2D Arrays with a Rigid Baffle 99
5.4.3 DOA Analysis in Underdetermined Cases 101
5.4.4 DOA Analysis: Further Methods 102
5.4.5 Effect of Spatial Aliasing and Microphone Noise on the Analysis of Diffuseness 103
5.5 First-Order DirAC with Monophonic Audio Transmission 105
5.6 First-Order DirAC with Multichannel Audio Transmission 106
5.6.1 Stream-Based Virtual Microphone Rendering 106
5.6.2 Evaluation of Virtual Microphone DirAC 109
5.6.3 Discussion of Virtual Microphone DirAC 111
5.6.4 Optimized DirAC Synthesis 111
5.6.5 DirAC-Based Reproduction of Spaced-Array Recordings 114
5.7 DirAC Synthesis for Headphones and for Hearing Aids 117
5.7.1 Reproduction of B-Format Signals 117
5.7.2 DirAC in Hearing Aids 118.
5.8 Optimizing the TimeFrequency Resolution of DirAC for Critical Signals 119
5.9 Example Implementation 120
5.9.1 Executing DirAC and Plotting Parameter History 122
5.9.2 DirAC Initialization 125
5.9.3 DirAC Runtime 131
5.9.4 Simplistic Binaural Synthesis of Loudspeaker Listening 136
5.10 Summary 137
References 138
6 Higher-Order Directional Audio Coding 141 /Archontis Politis and Ville Pulkki
6.1 Introduction 141
6.2 Sound Field Model 144
6.3 Energetic Analysis and Estimation of Parameters 145
6.3.1 Analysis of Intensity and Diffuseness in the Spherical Harmonic Domain 146
6.3.2 Higher-Order Energetic Analysis 147
6.3.3 Sector Profiles 149
6.4 Synthesis of Target Setup Signals 151
6.4.1 Loudspeaker Rendering 152
6.4.2 Binaural Rendering 155
6.5 Subjective Evaluation 157
6.6 Conclusions 157
References 158
7 Multi-Channel Sound Acquisition Using a Multi-Wave Sound Field Model 161 /Oliver Thiergart and Emanuel Habets
7.1 Introduction 161
7.2 Parametric Sound Acquisition and Processing 163
7.2.1 Problem Formulation 163
7.2.2 Principal Estimation of the Target Signal 166
7.3 Multi-Wave Sound Field and Signal Model 167
7.3.1 Direct Sound Model 168
7.3.2 Diffuse Sound Model 169
7.3.3 Noise Model 169
7.4 Direct and Diffuse Signal Estimation 170
7.4.1 Estimation of the Direct Signal Ys(k, n) 170
7.4.2 Estimation of the Diffuse Signal Yd(k, n) 176
7.5 Parameter Estimation 179
7.5.1 Estimation of the Number of Sources 179
7.5.2 Direction of Arrival Estimation 181
7.5.3 Microphone Input PSD Matrix 181
7.5.4 Noise PSD Estimation 182
7.5.5 Diffuse Sound PSD Estimation 182
7.5.6 Signal PSD Estimation in Multi-Wave Scenarios 185
7.6 Application to Spatial Sound Reproduction 186
7.6.1 State of the Art 186
7.6.2 Spatial Sound Reproduction Based on Informed Spatial Filtering 187
7.7 Summary 194
References 195
8 Adaptive Mixing of Excessively Directive and Robust Beamformers for Reproduction of Spatial Sound 201 /Symeon Delikaris-Manias and Juha Vilkamo.
8.1 Introduction 201
8.2 Notation and Signal Model 202
8.3 Overview of the Method 203
8.4 Loudspeaker-Based Spatial Sound Reproduction 204
8.4.1 Estimation of the Target Covariance Matrix Cy 204
8.4.2 Estimation of the Synthesis Beamforming Signals Ws 206
8.4.4 Processing the Synthesis Signals (Wsx) to Obtain the Target Covariance Matrix Cy 206
Spatial Energy Distribution 207
8.4.5 Listening Tests 208
8.5 Binaural-Based Spatial Sound Reproduction 209
8.5.1 Estimation of the Analysis and Synthesis Beamforming Weight Matrices 210
8.5.2 Diffuse-Field Equalization of HRTFs 210
8.5.3 Adaptive Mixing and Decorrelation 211
8.5.4 Subjective Evaluation 211
8.6 Conclusions 212
References 212
9 Source Separation and Reconstruction of Spatial Audio Using Spectrogram Factorization 215 /Joonas Nikunen and Tuomas Virtanen
9.1 Introduction 215
9.2 Spectrogram Factorization 217
9.2.1 Mixtures of Sounds 217
9.2.2 Magnitude Spectrogram Models 218
9.2.3 Complex-Valued Spectrogram Models 221
9.2.4 Source Separation by TimeFrequency Filtering 225
9.3 Array Signal Processing and Spectrogram Factorization 226
9.3.1 Spaced Microphone Arrays 226
9.3.2 Model for Spatial Covariance Based on Direction of Arrival 227
9.3.3 Complex-Valued NMF with the Spatial Covariance Model 229
9.4 Applications of Spectrogram Factorization in Spatial Audio 231
9.4.1 Parameterization of Surround Sound: Upmixing by TimeFrequency Filtering 231
9.4.2 Source Separation Using a Compact Microphone Array 233
9.4.3 Reconstruction of Binaural Sound Through Source Separation 238
9.5 Discussion 243
9.6 Matlab Example 243
References 247
Part III Signal-Dependent Spatial Filtering 251
10 TimeFrequency Domain Spatial Audio Enhancement 253 /Symeon Delikaris-Manias and Pasi Pertila
10.1 Introduction 253
10.2 Signal-Independent Enhancement 254
10.3 Signal-Dependent Enhancement 255
10.3.1 Adaptive Beamformers 255.
10.3.2 Post-Filters 257
10.3.3 Post-Filter Types 257
10.3.4 Estimating Post-Filters with Machine Learning 259
10.3.5 Post-Filter Design Based on Spatial Parameters 259
References 261
11 Cross-Spectrum-Based Post-Filter Utilizing Noisy and Robust Beamformers 265 /Symeon Delikaris-Manias and Ville Pulkki
11.1 Introduction 265
11.2 Notation and Signal Model 267
11.2.1 Virtual Microphone Design Utilizing Pressure Microphones 268
11.3 Estimation of the Cross-Spectrum-Based Post-Filter 269
11.3.1 Post-Filter Estimation Utilizing Two Static Beamformers 270
11.3.2 Post-Filter Estimation Utilizing a Static and an Adaptive Beamformer 272
11.3.3 Smoothing Techniques 277
11.4 Implementation Examples 279
11.4.1 Ideal Conditions 279
11.4.2 Prototype Microphone Arrays 281
11.5 Conclusions and Further Remarks 283
11.6 Source Code 284
References 287
12 Microphone-Array-Based Speech Enhancement Using Neural Networks /Pasi Pertila 291
12.1 Introduction 291
12.2 TimeFrequency Masks for Speech Enhancement Using Supervised Learning 293
12.2.1 Beamforming with Post-Filtering 293
12.2.2 Overview of Mask Prediction 294
12.2.3 Features for Mask Learning 295
12.2.4 Target Mask Design 297
12.3 Artificial Neural Networks 298
12.3.1 Learning the Weights 299
12.3.2 Generalization 301
12.3.3 Deep Neural Networks 305
12.4 Mask Learning: A Simulated Example 305
12.4.1 Feature Extraction 306
12.4.2 Target Mask Design 306
12.4.3 Neural Network Training 307
12.4.4 Results 308
12.5 Mask Learning: A Real-World Example 310
12.5.1 Brief Description of the Third CHiME Challenge Data 310
12.5.2 Data Processing and Beamforming 312
12.5.3 Description of Network Structure, Features, and Targets 312
12.5.4 Mask Prediction Results and Discussion 314
12.5.5 Speech Enhancement Results 316
12.6 Conclusions 318
12.7 Source Code 318
12.7.1 Matlab Code for Neural-Network-Based Sawtooth Denoising Example 318.
12.7.2 Matlab Code for Phase Feature Extraction 321
References 324
Part IV Applications 327
13 Upmixing and Beamforming in Professional Audio 329 /Christof Faller
13.1 Introduction 329
13.2 Stereo-to-Multichannel Upmix Processor 329
13.2.1 Product Description 329
13.2.2 Considerations for Professional Audio and Broadcast 331
13.2.3 Signal Processing 332
13.3 Digitally Enhanced Shotgun Microphone 336
13.3.1 Product Description 336
13.3.2 Concept 336
13.3.3 Signal Processing 336
13.3.4 Evaluations and Measurements 339
13.4 Surround Microphone System Based on Two Microphone Elements 341
13.4.1 Product Description 341
13.4.2 Concept 344
13.5 Summary 345
References 345
14 Spatial Sound Scene Synthesis and Manipulation for Virtual Reality and Audio Effects 347 /Ville Pulkki, Archontis Politis, Tapani Pihlajamaki, and Mikko-Ville Laitinen
14.1 Introduction 347
14.2 Parametric Sound Scene Synthesis for Virtual Reality 348
14.2.1 Overall Structure 348
14.2.2 Synthesis of Virtual Sources 350
14.2.3 Synthesis of Room Reverberation 352
14.2.4 Augmentation of Virtual Reality with Real Spatial Recordings 352
14.2.5 Higher-Order Processing 353
14.2.6 Loudspeaker-Signal Bus 354
14.3 Spatial Manipulation of Sound Scenes 355
14.3.1 Parametric Directional Transformations 356
14.3.2 Sweet-Spot Translation and Zooming 356
14.3.3 Spatial Filtering 356
14.3.4 Spatial Modulation 357
14.3.5 Diffuse Field Level Control 358
14.3.6 Ambience Extraction 359
14.3.7 Spatialization of Monophonic Signals 360
14.4 Summary 360
References 361
15 Parametric Spatial Audio Techniques in Teleconferencing and Remote Presence 363 /Anastasios Alexandridis, Despoina Pavlidi, Nikolaos Stefanakis, and Athanasios Mouchtaris
15.1 Introduction and Motivation 363
15.2 Background 365
15.3 Immersive Audio Communication System (ImmACS) 366
15.3.1 Encoder 366
15.3.2 Decoder 373
15.4 Capture and Reproduction of Crowded Acoustic Environments 376.
15.4.1 Sound Source Positioning Based on VBAP 376
15.4.2 Non-Parametric Approach 377
15.4.3 Parametric Approach 379
15.4.4 Example Application 382
15.5 Conclusions 384
References 384
Index 387.
Excerpt
Loading Excerpt...
Author Notes
Loading Author Notes...
Reviews from GoodReads
Loading GoodReads Reviews.
Staff View
Loading Staff View.