Compare commits
	
		
			1 Commits
		
	
	
		
	
	| Author | SHA1 | Date | |
|---|---|---|---|
| 235e0b6365 | 
							
								
								
									
										3
									
								
								.gitignore
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										3
									
								
								.gitignore
									
									
									
									
										vendored
									
									
								
							| @ -60,6 +60,3 @@ target/ | |||||||
| 
 | 
 | ||||||
| # PyCharm | # PyCharm | ||||||
| .idea/ | .idea/ | ||||||
| 
 |  | ||||||
| .mypy_cache/ |  | ||||||
| .pytest_cache/ |  | ||||||
|  | |||||||
							
								
								
									
										24
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										24
									
								
								README.md
									
									
									
									
									
								
							| @ -6,10 +6,10 @@ electromagnetic solver implemented in Python and OpenCL. | |||||||
| 
 | 
 | ||||||
| **Capabilities:** | **Capabilities:** | ||||||
| * Arbitrary distributions of the following: | * Arbitrary distributions of the following: | ||||||
|     * Dielectric constant (`epsilon`) |     * Dielectric constant (```epsilon```) | ||||||
|     * Magnetic permeabilty (`mu`) |     * Magnetic permeabilty (```mu```) | ||||||
|     * Perfect electric conductor (`PEC`) |     * Perfect electric conductor (```PEC```) | ||||||
|     * Perfect magnetic conductor (`PMC`) |     * Perfect magnetic conductor (```PMC```) | ||||||
| * Variable-sized rectangular grids | * Variable-sized rectangular grids | ||||||
|     * Stretched-coordinate PMLs (complex cell sizes allowed) |     * Stretched-coordinate PMLs (complex cell sizes allowed) | ||||||
| 
 | 
 | ||||||
| @ -17,10 +17,10 @@ Currently, only periodic boundary conditions are included. | |||||||
| PEC/PMC boundaries can be implemented by drawing PEC/PMC cells near the edges. | PEC/PMC boundaries can be implemented by drawing PEC/PMC cells near the edges. | ||||||
| Bloch boundary conditions are not included but wouldn't be very hard to add. | Bloch boundary conditions are not included but wouldn't be very hard to add. | ||||||
| 
 | 
 | ||||||
| The default solver `opencl_fdfd.cg_solver(...)` located in main.py | The default solver ```opencl_fdfd.cg_solver(...)``` located in main.py | ||||||
| implements the E-field wave operator directly (ie, as a list of OpenCL | implements the E-field wave operator directly (ie, as a list of OpenCL | ||||||
| instructions rather than a matrix). Additionally, there is a slower | instructions rather than a matrix). Additionally, there is a slower | ||||||
| (and slightly more versatile) solver in `csr.py` which attempts to solve | (and slightly more versatile) solver in ```csr.py``` which attempts to solve | ||||||
| an arbitrary sparse matrix in compressed sparse row (CSR) format using | an arbitrary sparse matrix in compressed sparse row (CSR) format using | ||||||
| the same conjugate gradient method as the default solver. The CSR solver | the same conjugate gradient method as the default solver. The CSR solver | ||||||
| is significantly slower, but can be very useful for testing alternative | is significantly slower, but can be very useful for testing alternative | ||||||
| @ -34,29 +34,29 @@ generalization to multiple GPUs should be pretty straightforward | |||||||
| ## Installation | ## Installation | ||||||
| 
 | 
 | ||||||
| **Dependencies:** | **Dependencies:** | ||||||
| * python 3 (written and tested with 3.7) | * python 3 (written and tested with 3.5)  | ||||||
| * numpy | * numpy | ||||||
| * pyopencl | * pyopencl | ||||||
| * jinja2 | * jinja2 | ||||||
| * [meanas](https://mpxd.net/code/jan/meanas) (>=0.5) | * [fdfd_tools](https://mpxd.net/gogs/jan/fdfd_tools) (>=0.2) | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| Install with pip, via git: | Install with pip, via git: | ||||||
| ```bash | ```bash | ||||||
| pip install git+https://mpxd.net/code/jan/opencl_fdfd.git@release | pip install git+https://mpxd.net/gogs/jan/opencl_fdfd.git@release | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| ## Use | ## Use | ||||||
| 
 | 
 | ||||||
| See the documentation for `opencl_fdfd.cg_solver(...)` | See the documentation for ```opencl_fdfd.cg_solver(...)``` | ||||||
| (located in ```main.py```) for details about how to call the solver. | (located in ```main.py```) for details about how to call the solver. | ||||||
| The FDFD arguments are identical to those in | The FDFD arguments are identical to those in | ||||||
| `meanas.solvers.generic(...)`, and a few solver-specific | ```fdfd_tools.solvers.generic(...)```, and a few solver-specific | ||||||
| arguments are available. | arguments are available. | ||||||
|   |   | ||||||
| An alternate (slower) FDFD solver and a general gpu-based sparse matrix | An alternate (slower) FDFD solver and a general gpu-based sparse matrix | ||||||
| solver is available in `csr.py`. These aren't particularly | solver is available in ```csr.py```. These aren't particularly | ||||||
| well-optimized, and something like | well-optimized, and something like | ||||||
| [MAGMA](http://icl.cs.utk.edu/magma/index.html) would probably be a | [MAGMA](http://icl.cs.utk.edu/magma/index.html) would probably be a | ||||||
| better choice if you absolutely need to solve arbitrary sparse matrices | better choice if you absolutely need to solve arbitrary sparse matrices | ||||||
|  | |||||||
| @ -1 +0,0 @@ | |||||||
| ../LICENSE.md |  | ||||||
| @ -1 +0,0 @@ | |||||||
| ../README.md |  | ||||||
| @ -31,14 +31,13 @@ | |||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
|   Dependencies: |   Dependencies: | ||||||
|     - meanas    ( https://mpxd.net/code/jan/meanas ) |     - fdfd_tools    ( https://mpxd.net/gogs/jan/fdfd_tools ) | ||||||
|     - numpy |     - numpy | ||||||
|     - pyopencl |     - pyopencl | ||||||
|     - jinja2 |     - jinja2 | ||||||
| """ | """ | ||||||
| 
 | 
 | ||||||
| from .main import cg_solver as cg_solver | from .main import cg_solver | ||||||
| 
 | 
 | ||||||
| __author__ = 'Jan Petykiewicz' | __author__ = 'Jan Petykiewicz' | ||||||
| __version__ = '0.4' | 
 | ||||||
| version = __version__ |  | ||||||
|  | |||||||
| @ -6,7 +6,7 @@ CSRMatrix sparse matrix representation. | |||||||
| 
 | 
 | ||||||
| The FDFD solver (fdfd_cg_solver()) solves an FDFD problem by | The FDFD solver (fdfd_cg_solver()) solves an FDFD problem by | ||||||
| creating a sparse matrix representing the problem (using | creating a sparse matrix representing the problem (using | ||||||
| meanas) and then passing it to cg(), which performs a | fdfd_tools) and then passing it to cg(), which performs a | ||||||
| conjugate gradient solve. | conjugate gradient solve. | ||||||
| 
 | 
 | ||||||
| cg() is capable of solving arbitrary sparse matrices which | cg() is capable of solving arbitrary sparse matrices which | ||||||
| @ -14,66 +14,54 @@ satisfy the constraints for the 'conjugate gradient' algorithm | |||||||
| (positive definite, symmetric) and some that don't. | (positive definite, symmetric) and some that don't. | ||||||
| """ | """ | ||||||
| 
 | 
 | ||||||
| from typing import Any, TYPE_CHECKING | from typing import Dict, Any | ||||||
| import time | import time | ||||||
| import logging |  | ||||||
| 
 | 
 | ||||||
| import numpy | import numpy | ||||||
| from numpy.typing import NDArray, ArrayLike |  | ||||||
| from numpy.linalg import norm | from numpy.linalg import norm | ||||||
| from numpy import complexfloating |  | ||||||
| import pyopencl | import pyopencl | ||||||
| import pyopencl.array | import pyopencl.array | ||||||
| import meanas.fdfd.solvers | 
 | ||||||
|  | import fdfd_tools.solvers | ||||||
| 
 | 
 | ||||||
| from . import ops | from . import ops | ||||||
| 
 | 
 | ||||||
| if TYPE_CHECKING: |  | ||||||
|     import scipy |  | ||||||
| 
 | 
 | ||||||
| 
 | class CSRMatrix(object): | ||||||
| logger = logging.getLogger(__name__) |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| class CSRMatrix: |  | ||||||
|     """ |     """ | ||||||
|     Matrix stored in Compressed Sparse Row format, in GPU RAM. |     Matrix stored in Compressed Sparse Row format, in GPU RAM. | ||||||
|     """ |     """ | ||||||
|     row_ptr: pyopencl.array.Array |     row_ptr = None      # type: pyopencl.array.Array | ||||||
|     col_ind: pyopencl.array.Array |     col_ind = None      # type: pyopencl.array.Array | ||||||
|     data: pyopencl.array.Array |     data = None         # type: pyopencl.array.Array | ||||||
| 
 | 
 | ||||||
|     def __init__( |     def __init__(self, | ||||||
|             self, |                  queue: pyopencl.CommandQueue, | ||||||
|             queue: pyopencl.CommandQueue, |                  m: 'scipy.sparse.csr_matrix'): | ||||||
|             m: 'scipy.sparse.csr_matrix', |  | ||||||
|             ) -> None: |  | ||||||
|         self.row_ptr = pyopencl.array.to_device(queue, m.indptr) |         self.row_ptr = pyopencl.array.to_device(queue, m.indptr) | ||||||
|         self.col_ind = pyopencl.array.to_device(queue, m.indices) |         self.col_ind = pyopencl.array.to_device(queue, m.indices) | ||||||
|         self.data = pyopencl.array.to_device(queue, m.data.astype(numpy.complex128)) |         self.data = pyopencl.array.to_device(queue, m.data.astype(numpy.complex128)) | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def cg( | def cg(A: 'scipy.sparse.csr_matrix', | ||||||
|         A: 'scipy.sparse.csr_matrix', |        b: numpy.ndarray, | ||||||
|         b: ArrayLike, |        max_iters: int = 10000, | ||||||
|         max_iters: int = 10000, |        err_threshold: float = 1e-6, | ||||||
|         err_threshold: float = 1e-6, |        context: pyopencl.Context = None, | ||||||
|         context: pyopencl.Context | None = None, |        queue: pyopencl.CommandQueue = None, | ||||||
|         queue: pyopencl.CommandQueue | None = None, |        verbose: bool = False, | ||||||
|         ) -> NDArray[complexfloating]: |        ) -> numpy.ndarray: | ||||||
|     """ |     """ | ||||||
|     General conjugate-gradient solver for sparse matrices, where A @ x = b. |     General conjugate-gradient solver for sparse matrices, where A @ x = b. | ||||||
| 
 | 
 | ||||||
|     Args: |     :param A: Matrix to solve (CSR format) | ||||||
|         A: Matrix to solve (CSR format) |     :param b: Right-hand side vector (dense ndarray) | ||||||
|         b: Right-hand side vector (dense ndarray) |     :param max_iters: Maximum number of iterations | ||||||
|         max_iters: Maximum number of iterations |     :param err_threshold: Error threshold for successful solve, relative to norm(b) | ||||||
|         err_threshold: Error threshold for successful solve, relative to norm(b) |     :param context: PyOpenCL context. Will be created if not given. | ||||||
|         context: PyOpenCL context. Will be created if not given. |     :param queue: PyOpenCL command queue. Will be created if not given. | ||||||
|         queue: PyOpenCL command queue. Will be created if not given. |     :param verbose: Whether to print statistics to screen. | ||||||
| 
 |     :return: Solution vector x; returned even if solve doesn't converge. | ||||||
|     Returns: |  | ||||||
|         Solution vector x; returned even if solve doesn't converge. |  | ||||||
|     """ |     """ | ||||||
| 
 | 
 | ||||||
|     start_time = time.perf_counter() |     start_time = time.perf_counter() | ||||||
| @ -84,10 +72,10 @@ def cg( | |||||||
|     if queue is None: |     if queue is None: | ||||||
|         queue = pyopencl.CommandQueue(context) |         queue = pyopencl.CommandQueue(context) | ||||||
| 
 | 
 | ||||||
|     def load_field(v: NDArray[numpy.complexfloating], dtype: type = numpy.complex128) -> pyopencl.array.Array: |     def load_field(v, dtype=numpy.complex128): | ||||||
|         return pyopencl.array.to_device(queue, v.astype(dtype)) |         return pyopencl.array.to_device(queue, v.astype(dtype)) | ||||||
| 
 | 
 | ||||||
|     r = load_field(numpy.asarray(b)) |     r = load_field(b) | ||||||
|     x = pyopencl.array.zeros_like(r) |     x = pyopencl.array.zeros_like(r) | ||||||
|     v = pyopencl.array.zeros_like(r) |     v = pyopencl.array.zeros_like(r) | ||||||
|     p = pyopencl.array.zeros_like(r) |     p = pyopencl.array.zeros_like(r) | ||||||
| @ -98,27 +86,29 @@ def cg( | |||||||
| 
 | 
 | ||||||
|     m = CSRMatrix(queue, A) |     m = CSRMatrix(queue, A) | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Generate OpenCL kernels |     Generate OpenCL kernels | ||||||
|     # |     ''' | ||||||
|     a_step = ops.create_a_csr(context) |     a_step = ops.create_a_csr(context) | ||||||
|     xr_step = ops.create_xr_step(context) |     xr_step = ops.create_xr_step(context) | ||||||
|     rhoerr_step = ops.create_rhoerr_step(context) |     rhoerr_step = ops.create_rhoerr_step(context) | ||||||
|     p_step = ops.create_p_step(context) |     p_step = ops.create_p_step(context) | ||||||
|     dot = ops.create_dot(context) |     dot = ops.create_dot(context) | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Start the solve |     Start the solve | ||||||
|     # |     ''' | ||||||
|     start_time2 = time.perf_counter() |     start_time2 = time.perf_counter() | ||||||
| 
 | 
 | ||||||
|     _, err2 = rhoerr_step(r, []) |     _, err2 = rhoerr_step(r, []) | ||||||
|     b_norm = numpy.sqrt(err2) |     b_norm = numpy.sqrt(err2) | ||||||
|     logging.debug(f'b_norm check: {b_norm}') |     if verbose: | ||||||
|  |         print('b_norm check: ', b_norm) | ||||||
| 
 | 
 | ||||||
|     success = False |     success = False | ||||||
|     for k in range(max_iters): |     for k in range(max_iters): | ||||||
|         logging.debug(f'[{k:06d}] rho {rho:.4} alpha {alpha:4.4}') |         if verbose: | ||||||
|  |             print('[{:06d}] rho {:.4} alpha {:4.4}'.format(k, rho, alpha), end=' ') | ||||||
| 
 | 
 | ||||||
|         rho_prev = rho |         rho_prev = rho | ||||||
|         e = xr_step(x, p, r, v, alpha, []) |         e = xr_step(x, p, r, v, alpha, []) | ||||||
| @ -126,7 +116,8 @@ def cg( | |||||||
| 
 | 
 | ||||||
|         errs += [numpy.sqrt(err2) / b_norm] |         errs += [numpy.sqrt(err2) / b_norm] | ||||||
| 
 | 
 | ||||||
|         logging.debug(f'err {errs[-1]}') |         if verbose: | ||||||
|  |             print('err', errs[-1]) | ||||||
| 
 | 
 | ||||||
|         if errs[-1] < err_threshold: |         if errs[-1] < err_threshold: | ||||||
|             success = True |             success = True | ||||||
| @ -136,60 +127,53 @@ def cg( | |||||||
|         e = a_step(v, m, p, e) |         e = a_step(v, m, p, e) | ||||||
|         alpha = rho / dot(p, v, e) |         alpha = rho / dot(p, v, e) | ||||||
| 
 | 
 | ||||||
|         if k % 1000 == 0: |         if verbose and k % 1000 == 0: | ||||||
|             logger.info(f'iteration {k}') |             print(k) | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Done solving |     Done solving | ||||||
|     # |     ''' | ||||||
|     time_elapsed = time.perf_counter() - start_time |     time_elapsed = time.perf_counter() - start_time | ||||||
| 
 | 
 | ||||||
|     x = x.get() |     x = x.get() | ||||||
| 
 | 
 | ||||||
|     if success: |     if verbose: | ||||||
|         logging.info('Solve success') |         if success: | ||||||
|     else: |             print('Success', end='') | ||||||
|         logging.warning('Solve failure') |         else: | ||||||
|     logging.info(f'{k} iterations in {time_elapsed} sec: {k / time_elapsed} iterations/sec') |             print('Failure', end=', ') | ||||||
|     logging.debug(f'final error {errs[-1]}') |         print(', {} iterations in {} sec: {} iterations/sec \ | ||||||
|     logging.debug(f'overhead {start_time2 - start_time} sec') |                       '.format(k, time_elapsed, k / time_elapsed)) | ||||||
|  |         print('final error', errs[-1]) | ||||||
|  |         print('overhead {} sec'.format(start_time2 - start_time)) | ||||||
| 
 | 
 | ||||||
|     residual = norm(A @ x - b) / norm(b) |         print('Final residual:', norm(A @ x - b) / norm(b)) | ||||||
|     logging.info(f'Final residual: {residual}') |  | ||||||
|     return x |     return x | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def fdfd_cg_solver( | def fdfd_cg_solver(solver_opts: Dict[str, Any] = None, | ||||||
|         solver_opts: dict[str, Any] | None = None, |                    **fdfd_args | ||||||
|         **fdfd_args, |                    ) -> numpy.ndarray: | ||||||
|         ) -> NDArray[complexfloating]: |  | ||||||
|     """ |     """ | ||||||
|     Conjugate gradient FDFD solver using CSR sparse matrices, mainly for |     Conjugate gradient FDFD solver using CSR sparse matrices, mainly for | ||||||
|      testing and development since it's much slower than the solver in main.py. |      testing and development since it's much slower than the solver in main.py. | ||||||
| 
 | 
 | ||||||
|     Calls meanas.fdfd.solvers.generic( |     Calls fdfd_tools.solvers.generic(**fdfd_args, | ||||||
|         **fdfd_args, |                                      matrix_solver=opencl_fdfd.csr.cg, | ||||||
|         matrix_solver=opencl_fdfd.csr.cg, |                                      matrix_solver_opts=solver_opts) | ||||||
|         matrix_solver_opts=solver_opts, |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     Args: |     :param solver_opts: Passed as matrix_solver_opts to fdfd_tools.solver.generic(...). | ||||||
|         solver_opts: Passed as matrix_solver_opts to fdfd_tools.solver.generic(...). |         Default {}. | ||||||
|             Default {}. |     :param fdfd_args: Passed as **fdfd_args to fdfd_tools.solver.generic(...). | ||||||
|         fdfd_args: Passed as **fdfd_args to fdfd_tools.solver.generic(...). |         Should include all of the arguments **except** matrix_solver and matrix_solver_opts | ||||||
|             Should include all of the arguments **except** matrix_solver and matrix_solver_opts |     :return: E-field which solves the system. | ||||||
| 
 |  | ||||||
|     Returns: |  | ||||||
|         E-field which solves the system. |  | ||||||
|     """ |     """ | ||||||
| 
 | 
 | ||||||
|     if solver_opts is None: |     if solver_opts is None: | ||||||
|         solver_opts = dict() |         solver_opts = dict() | ||||||
| 
 | 
 | ||||||
|     x = meanas.fdfd.solvers.generic( |     x = fdfd_tools.solvers.generic(matrix_solver=cg, | ||||||
|         matrix_solver=cg, |                                    matrix_solver_opts=solver_opts, | ||||||
|         matrix_solver_opts=solver_opts, |                                    **fdfd_args) | ||||||
|         **fdfd_args, |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     return x |     return x | ||||||
|  | |||||||
| @ -31,7 +31,7 @@ __global char *pmc_z = pmc + ZZ; | |||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| //Update H components; set them to 0 if PMC is enabled at that location. | //Update H components; set them to 0 if PMC is enabled at that location. | ||||||
| //Mu division and PMC conditional are only included if {mu} and {pmc} are true | //Mu division and PMC conditional are only included if {{mu}} and {{pmc}} are true | ||||||
| {% if pmc -%} | {% if pmc -%} | ||||||
| if (pmc_x[i] != 0) { | if (pmc_x[i] != 0) { | ||||||
|     Hx[i] = zero; |     Hx[i] = zero; | ||||||
| @ -42,9 +42,9 @@ if (pmc_x[i] != 0) { | |||||||
|     ctype Dyz = mul(sub(Ey[i + pz], Ey[i]), inv_dez[z]); |     ctype Dyz = mul(sub(Ey[i + pz], Ey[i]), inv_dez[z]); | ||||||
|     ctype x_curl = sub(Dzy, Dyz); |     ctype x_curl = sub(Dzy, Dyz); | ||||||
| 
 | 
 | ||||||
|     {%- if mu %} |     {%- if mu -%} | ||||||
|     Hx[i] = mul(inv_mu_x[i], x_curl); |     Hx[i] = mul(inv_mu_x[i], x_curl); | ||||||
|     {%- else %} |     {%- else -%} | ||||||
|     Hx[i] = x_curl; |     Hx[i] = x_curl; | ||||||
|     {%- endif %} |     {%- endif %} | ||||||
| } | } | ||||||
| @ -59,9 +59,9 @@ if (pmc_y[i] != 0) { | |||||||
|     ctype Dzx = mul(sub(Ez[i + px], Ez[i]), inv_dex[x]); |     ctype Dzx = mul(sub(Ez[i + px], Ez[i]), inv_dex[x]); | ||||||
|     ctype y_curl = sub(Dxz, Dzx); |     ctype y_curl = sub(Dxz, Dzx); | ||||||
| 
 | 
 | ||||||
|     {%- if mu %} |     {%- if mu -%} | ||||||
|     Hy[i] = mul(inv_mu_y[i], y_curl); |     Hy[i] = mul(inv_mu_y[i], y_curl); | ||||||
|     {%- else %} |     {%- else -%} | ||||||
|     Hy[i] = y_curl; |     Hy[i] = y_curl; | ||||||
|     {%- endif %} |     {%- endif %} | ||||||
| } | } | ||||||
| @ -76,9 +76,9 @@ if (pmc_z[i] != 0) { | |||||||
|     ctype Dxy = mul(sub(Ex[i + py], Ex[i]), inv_dey[y]); |     ctype Dxy = mul(sub(Ex[i + py], Ex[i]), inv_dey[y]); | ||||||
|     ctype z_curl = sub(Dyx, Dxy); |     ctype z_curl = sub(Dyx, Dxy); | ||||||
| 
 | 
 | ||||||
|     {%- if mu %} |     {%- if mu -%} | ||||||
|     Hz[i] = mul(inv_mu_z[i], z_curl); |     Hz[i] = mul(inv_mu_z[i], z_curl); | ||||||
|     {%- else %} |     {%- else -%} | ||||||
|     Hz[i] = z_curl; |     Hz[i] = z_curl; | ||||||
|     {%- endif %} |     {%- endif %} | ||||||
| } | } | ||||||
|  | |||||||
| @ -5,70 +5,67 @@ This file holds the default FDFD solver, which uses an E-field wave | |||||||
| operator implemented directly as OpenCL arithmetic (rather than as | operator implemented directly as OpenCL arithmetic (rather than as | ||||||
| a matrix). | a matrix). | ||||||
| """ | """ | ||||||
|  | 
 | ||||||
|  | from typing import List | ||||||
| import time | import time | ||||||
| import logging |  | ||||||
| 
 | 
 | ||||||
| import numpy | import numpy | ||||||
| from numpy.typing import NDArray, ArrayLike |  | ||||||
| from numpy.linalg import norm | from numpy.linalg import norm | ||||||
| from numpy import floating, complexfloating |  | ||||||
| import pyopencl | import pyopencl | ||||||
| import pyopencl.array | import pyopencl.array | ||||||
| 
 | 
 | ||||||
| import meanas.fdfd.operators | import fdfd_tools.operators | ||||||
| 
 | 
 | ||||||
| from . import ops | from . import ops | ||||||
| 
 | 
 | ||||||
| 
 | __author__ = 'Jan Petykiewicz' | ||||||
| logger = logging.getLogger(__name__) |  | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def cg_solver( | def cg_solver(omega: complex, | ||||||
|         omega: complex, |               dxes: List[List[numpy.ndarray]], | ||||||
|         dxes: list[list[NDArray[floating | complexfloating]]], |               J: numpy.ndarray, | ||||||
|         J: ArrayLike, |               epsilon: numpy.ndarray, | ||||||
|         epsilon: ArrayLike, |               mu: numpy.ndarray = None, | ||||||
|         mu: ArrayLike | None = None, |               pec: numpy.ndarray = None, | ||||||
|         pec: ArrayLike | None = None, |               pmc: numpy.ndarray = None, | ||||||
|         pmc: ArrayLike | None = None, |               adjoint: bool = False, | ||||||
|         adjoint: bool = False, |               max_iters: int = 40000, | ||||||
|         max_iters: int = 40000, |               err_threshold: float = 1e-6, | ||||||
|         err_threshold: float = 1e-6, |               context: pyopencl.Context = None, | ||||||
|         context: pyopencl.Context | None = None, |               verbose: bool = False, | ||||||
|         ) -> NDArray: |               ) -> numpy.ndarray: | ||||||
|     """ |     """ | ||||||
|     OpenCL FDFD solver using the iterative conjugate gradient (cg) method |     OpenCL FDFD solver using the iterative conjugate gradient (cg) method | ||||||
|      and implementing the diagonalized E-field wave operator directly in |      and implementing the diagonalized E-field wave operator directly in | ||||||
|      OpenCL. |      OpenCL. | ||||||
| 
 | 
 | ||||||
|     All ndarray arguments should be 1D arrays. To linearize a list of 3 3D ndarrays, |     All ndarray arguments should be 1D arrays. To linearize a list of 3 3D ndarrays, | ||||||
|      either use meanas.fdmath.vec() or numpy: |      either use fdfd_tools.vec() or numpy: | ||||||
|      f_1D = numpy.hstack(tuple((fi.flatten(order='F') for fi in [f_x, f_y, f_z]))) |      f_1D = numpy.hstack(tuple((fi.flatten(order='F') for fi in [f_x, f_y, f_z]))) | ||||||
| 
 | 
 | ||||||
|     Args: |     :param omega: Complex frequency to solve at. | ||||||
|         omega: Complex frequency to solve at. |     :param dxes: [[dx_e, dy_e, dz_e], [dx_h, dy_h, dz_h]] (complex cell sizes) | ||||||
|         dxes: [[dx_e, dy_e, dz_e], [dx_h, dy_h, dz_h]] (complex cell sizes) |     :param J: Electric current distribution (at E-field locations) | ||||||
|         J: Electric current distribution (at E-field locations) |     :param epsilon: Dielectric constant distribution (at E-field locations) | ||||||
|         epsilon: Dielectric constant distribution (at E-field locations) |     :param mu: Magnetic permeability distribution (at H-field locations) | ||||||
|         mu: Magnetic permeability distribution (at H-field locations) |     :param pec: Perfect electric conductor distribution | ||||||
|         pec: Perfect electric conductor distribution |         (at E-field locations; non-zero value indicates PEC is present) | ||||||
|             (at E-field locations; non-zero value indicates PEC is present) |     :param pmc: Perfect magnetic conductor distribution | ||||||
|         pmc: Perfect magnetic conductor distribution |         (at H-field locations; non-zero value indicates PMC is present) | ||||||
|             (at H-field locations; non-zero value indicates PMC is present) |     :param adjoint: If true, solves the adjoint problem. | ||||||
|         adjoint: If true, solves the adjoint problem. |     :param max_iters: Maximum number of iterations. Default 40,000. | ||||||
|         max_iters: Maximum number of iterations. Default 40,000. |     :param err_threshold: If (r @ r.conj()) / norm(1j * omega * J) < err_threshold, success. | ||||||
|         err_threshold: If (r @ r.conj()) / norm(1j * omega * J) < err_threshold, success. |         Default 1e-6. | ||||||
|             Default 1e-6. |     :param context: PyOpenCL context to run in. If not given, construct a new context. | ||||||
|         context: PyOpenCL context to run in. If not given, construct a new context. |     :param verbose: If True, print progress to stdout. Default False. | ||||||
| 
 |     :return: E-field which solves the system. Returned even if we did not converge. | ||||||
|     Returns: |  | ||||||
|         E-field which solves the system. Returned even if we did not converge. |  | ||||||
|     """ |     """ | ||||||
|  | 
 | ||||||
|     start_time = time.perf_counter() |     start_time = time.perf_counter() | ||||||
| 
 | 
 | ||||||
|     shape = [dd.size for dd in dxes[0]] |     b = -1j * omega * J | ||||||
| 
 | 
 | ||||||
|     b = -1j * omega * numpy.asarray(J) |     shape = [d.size for d in dxes[0]] | ||||||
| 
 | 
 | ||||||
|     ''' |     ''' | ||||||
|         ** In this comment, I use the following notation: |         ** In this comment, I use the following notation: | ||||||
| @ -97,29 +94,30 @@ def cg_solver( | |||||||
|         We can accomplish all this simply by conjugating everything (except J) and |         We can accomplish all this simply by conjugating everything (except J) and | ||||||
|          reversing the order of L and R |          reversing the order of L and R | ||||||
|     ''' |     ''' | ||||||
|     epsilon = numpy.asarray(epsilon) |  | ||||||
| 
 |  | ||||||
|     if adjoint: |     if adjoint: | ||||||
|         # Conjugate everything |         # Conjugate everything | ||||||
|         dxes = [[numpy.conj(dd) for dd in dds] for dds in dxes] |         dxes = [[numpy.conj(d) for d in dd] for dd in dxes] | ||||||
|         omega = numpy.conj(omega) |         omega = numpy.conj(omega) | ||||||
|         epsilon = numpy.conj(epsilon) |         epsilon = numpy.conj(epsilon) | ||||||
|         if mu is not None: |         if mu is not None: | ||||||
|             mu = numpy.conj(mu) |             mu = numpy.conj(mu) | ||||||
|     assert isinstance(epsilon, NDArray[floating] | NDArray[complexfloating]) |  | ||||||
| 
 | 
 | ||||||
|     L, R = meanas.fdfd.operators.e_full_preconditioners(dxes) |     L, R = fdfd_tools.operators.e_full_preconditioners(dxes) | ||||||
|     b_preconditioned = (R if adjoint else L) @ b |  | ||||||
| 
 | 
 | ||||||
|     # |     if adjoint: | ||||||
|     # Allocate GPU memory and load in data |         b_preconditioned = R @ b | ||||||
|     # |     else: | ||||||
|  |         b_preconditioned = L @ b | ||||||
|  | 
 | ||||||
|  |     ''' | ||||||
|  |         Allocate GPU memory and load in data | ||||||
|  |     ''' | ||||||
|     if context is None: |     if context is None: | ||||||
|         context = pyopencl.create_some_context(interactive=True) |         context = pyopencl.create_some_context(interactive=True) | ||||||
| 
 | 
 | ||||||
|     queue = pyopencl.CommandQueue(context) |     queue = pyopencl.CommandQueue(context) | ||||||
| 
 | 
 | ||||||
|     def load_field(v: NDArray[complexfloating | floating], dtype: type = numpy.complex128) -> pyopencl.array.Array: |     def load_field(v, dtype=numpy.complex128): | ||||||
|         return pyopencl.array.to_device(queue, v.astype(dtype)) |         return pyopencl.array.to_device(queue, v.astype(dtype)) | ||||||
| 
 | 
 | ||||||
|     r = load_field(b_preconditioned)  # load preconditioned b into r |     r = load_field(b_preconditioned)  # load preconditioned b into r | ||||||
| @ -132,31 +130,30 @@ def cg_solver( | |||||||
|     rho = 1.0 + 0j |     rho = 1.0 + 0j | ||||||
|     errs = [] |     errs = [] | ||||||
| 
 | 
 | ||||||
|     inv_dxes = [[load_field(1 / numpy.asarray(dd)) for dd in dds] for dds in dxes] |     inv_dxes = [[load_field(1 / d) for d in dd] for dd in dxes] | ||||||
|     oeps = load_field(-omega * omega * epsilon) |     oeps = load_field(-omega ** 2 * epsilon) | ||||||
|     Pl = load_field(L.diagonal()) |     Pl = load_field(L.diagonal()) | ||||||
|     Pr = load_field(R.diagonal()) |     Pr = load_field(R.diagonal()) | ||||||
| 
 | 
 | ||||||
|     if mu is None: |     if mu is None: | ||||||
|         invm = load_field(numpy.array([])) |         invm = load_field(numpy.array([])) | ||||||
|     else: |     else: | ||||||
|         invm = load_field(1 / numpy.asarray(mu)) |         invm = load_field(1 / mu) | ||||||
|         mu = numpy.asarray(mu) |  | ||||||
| 
 | 
 | ||||||
|     if pec is None: |     if pec is None: | ||||||
|         gpec = load_field(numpy.array([]), dtype=numpy.int8) |         gpec = load_field(numpy.array([]), dtype=numpy.int8) | ||||||
|     else: |     else: | ||||||
|         gpec = load_field(numpy.asarray(pec, dtype=bool), dtype=numpy.int8) |         gpec = load_field(pec.astype(bool), dtype=numpy.int8) | ||||||
| 
 | 
 | ||||||
|     if pmc is None: |     if pmc is None: | ||||||
|         gpmc = load_field(numpy.array([]), dtype=numpy.int8) |         gpmc = load_field(numpy.array([]), dtype=numpy.int8) | ||||||
|     else: |     else: | ||||||
|         gpmc = load_field(numpy.asarray(pmc, dtype=bool), dtype=numpy.int8) |         gpmc = load_field(pmc.astype(bool), dtype=numpy.int8) | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Generate OpenCL kernels |     Generate OpenCL kernels | ||||||
|     # |     ''' | ||||||
|     has_mu, has_pec, has_pmc = (qq is not None for qq in (mu, pec, pmc)) |     has_mu, has_pec, has_pmc = [q is not None for q in (mu, pec, pmc)] | ||||||
| 
 | 
 | ||||||
|     a_step_full = ops.create_a(context, shape, has_mu, has_pec, has_pmc) |     a_step_full = ops.create_a(context, shape, has_mu, has_pec, has_pmc) | ||||||
|     xr_step = ops.create_xr_step(context) |     xr_step = ops.create_xr_step(context) | ||||||
| @ -164,28 +161,22 @@ def cg_solver( | |||||||
|     p_step = ops.create_p_step(context) |     p_step = ops.create_p_step(context) | ||||||
|     dot = ops.create_dot(context) |     dot = ops.create_dot(context) | ||||||
| 
 | 
 | ||||||
|     def a_step( |     def a_step(E, H, p, events): | ||||||
|             E: pyopencl.array.Array, |  | ||||||
|             H: pyopencl.array.Array, |  | ||||||
|             p: pyopencl.array.Array, |  | ||||||
|             events: list[pyopencl.Event], |  | ||||||
|             ) -> list[pyopencl.Event]: |  | ||||||
|         return a_step_full(E, H, p, inv_dxes, oeps, invm, gpec, gpmc, Pl, Pr, events) |         return a_step_full(E, H, p, inv_dxes, oeps, invm, gpec, gpmc, Pl, Pr, events) | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Start the solve |     Start the solve | ||||||
|     # |     ''' | ||||||
|     start_time2 = time.perf_counter() |     start_time2 = time.perf_counter() | ||||||
| 
 | 
 | ||||||
|     _, err2 = rhoerr_step(r, []) |     _, err2 = rhoerr_step(r, []) | ||||||
|     b_norm = numpy.sqrt(err2) |     b_norm = numpy.sqrt(err2) | ||||||
|     logging.debug(f'b_norm check: {b_norm}') |     print('b_norm check: ', b_norm) | ||||||
| 
 | 
 | ||||||
|     success = False |     success = False | ||||||
|     for k in range(max_iters): |     for k in range(max_iters): | ||||||
|         do_print = (k % 100 == 0) |         if verbose: | ||||||
|         if do_print: |             print('[{:06d}] rho {:.4} alpha {:4.4}'.format(k, rho, alpha), end=' ') | ||||||
|             logger.debug(f'[{k:06d}] rho {rho:.4} alpha {alpha:4.4}') |  | ||||||
| 
 | 
 | ||||||
|         rho_prev = rho |         rho_prev = rho | ||||||
|         e = xr_step(x, p, r, v, alpha, []) |         e = xr_step(x, p, r, v, alpha, []) | ||||||
| @ -193,8 +184,8 @@ def cg_solver( | |||||||
| 
 | 
 | ||||||
|         errs += [numpy.sqrt(err2) / b_norm] |         errs += [numpy.sqrt(err2) / b_norm] | ||||||
| 
 | 
 | ||||||
|         if do_print: |         if verbose: | ||||||
|             logger.debug(f'err {errs[-1]}') |             print('err', errs[-1]) | ||||||
| 
 | 
 | ||||||
|         if errs[-1] < err_threshold: |         if errs[-1] < err_threshold: | ||||||
|             success = True |             success = True | ||||||
| @ -205,30 +196,32 @@ def cg_solver( | |||||||
|         alpha = rho / dot(p, v, e) |         alpha = rho / dot(p, v, e) | ||||||
| 
 | 
 | ||||||
|         if k % 1000 == 0: |         if k % 1000 == 0: | ||||||
|             logger.info(f'iteration {k}') |             print(k) | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Done solving |     Done solving | ||||||
|     # |     ''' | ||||||
|     time_elapsed = time.perf_counter() - start_time |     time_elapsed = time.perf_counter() - start_time | ||||||
| 
 | 
 | ||||||
|     # Undo preconditioners |     # Undo preconditioners | ||||||
|     x = ((Pl if adjoint else Pr) * x).get() |     if adjoint: | ||||||
|  |         x = (Pl * x).get() | ||||||
|  |     else: | ||||||
|  |         x = (Pr * x).get() | ||||||
| 
 | 
 | ||||||
|     if success: |     if success: | ||||||
|         logger.info('Solve success') |         print('Success', end='') | ||||||
|     else: |     else: | ||||||
|         logger.warning('Solve failure') |         print('Failure', end=', ') | ||||||
|     logger.info(f'{k} iterations in {time_elapsed} sec: {k / time_elapsed} iterations/sec') |     print(', {} iterations in {} sec: {} iterations/sec \ | ||||||
|     logger.debug(f'final error {errs[-1]}') |                   '.format(k, time_elapsed, k / time_elapsed)) | ||||||
|     logger.debug(f'overhead {start_time2 - start_time} sec') |     print('final error', errs[-1]) | ||||||
|  |     print('overhead {} sec'.format(start_time2 - start_time)) | ||||||
| 
 | 
 | ||||||
|     A0 = meanas.fdfd.operators.e_full(omega, dxes, epsilon, mu).tocsr() |     A0 = fdfd_tools.operators.e_full(omega, dxes, epsilon, mu).tocsr() | ||||||
|     if adjoint: |     if adjoint: | ||||||
|         # Remember we conjugated all the contents of A earlier |         # Remember we conjugated all the contents of A earlier | ||||||
|         A0 = A0.T |         A0 = A0.T | ||||||
| 
 |     print('Post-everything residual:', norm(A0 @ x - b) / norm(b)) | ||||||
|     residual = norm(A0 @ x - b) / norm(b) |  | ||||||
|     logger.info(f'Post-everything residual: {residual}') |  | ||||||
|     return x |     return x | ||||||
| 
 | 
 | ||||||
|  | |||||||
| @ -7,11 +7,9 @@ kernels for use by the other solvers. | |||||||
| See kernels/ for any of the .cl files loaded in this file. | See kernels/ for any of the .cl files loaded in this file. | ||||||
| """ | """ | ||||||
| 
 | 
 | ||||||
| from collections.abc import Callable, Sequence | from typing import List, Callable | ||||||
| import logging |  | ||||||
| 
 | 
 | ||||||
| import numpy | import numpy | ||||||
| from numpy.typing import ArrayLike |  | ||||||
| import jinja2 | import jinja2 | ||||||
| 
 | 
 | ||||||
| import pyopencl | import pyopencl | ||||||
| @ -20,77 +18,59 @@ from pyopencl.elementwise import ElementwiseKernel | |||||||
| from pyopencl.reduction import ReductionKernel | from pyopencl.reduction import ReductionKernel | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| from .csr import CSRMatrix |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| logger = logging.getLogger(__name__) |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| class FDFDError(Exception): |  | ||||||
|     """ Custom error for opencl_fdfd """ |  | ||||||
|     pass |  | ||||||
| 
 |  | ||||||
| # Create jinja2 env on module load | # Create jinja2 env on module load | ||||||
| jinja_env = jinja2.Environment(loader=jinja2.PackageLoader(__name__, 'kernels')) | jinja_env = jinja2.Environment(loader=jinja2.PackageLoader(__name__, 'kernels')) | ||||||
| 
 | 
 | ||||||
| # Return type for the create_opname(...) functions | # Return type for the create_opname(...) functions | ||||||
| operation = Callable[..., list[pyopencl.Event]] | operation = Callable[..., List[pyopencl.Event]] | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def type_to_C( | def type_to_C(float_type: numpy.float32 or numpy.float64) -> str: | ||||||
|         float_type: type[numpy.floating | numpy.complexfloating], |  | ||||||
|         ) -> str: |  | ||||||
|     """ |     """ | ||||||
|     Returns a string corresponding to the C equivalent of a numpy type. |     Returns a string corresponding to the C equivalent of a numpy type. | ||||||
| 
 | 
 | ||||||
|     Args: |     :param float_type: numpy type: float32, float64, complex64, complex128 | ||||||
|         float_type: numpy type: float32, float64, complex64, complex128 |     :return: string containing the corresponding C type (eg. 'double') | ||||||
| 
 |  | ||||||
|     Returns: |  | ||||||
|         string containing the corresponding C type (eg. 'double') |  | ||||||
|     """ |     """ | ||||||
|     types = { |     types = { | ||||||
|         numpy.float16: 'half', |  | ||||||
|         numpy.float32: 'float', |         numpy.float32: 'float', | ||||||
|         numpy.float64: 'double', |         numpy.float64: 'double', | ||||||
|         numpy.complex64: 'cfloat_t', |         numpy.complex64: 'cfloat_t', | ||||||
|         numpy.complex128: 'cdouble_t', |         numpy.complex128: 'cdouble_t', | ||||||
|     } |     } | ||||||
|     if float_type not in types: |     if float_type not in types: | ||||||
|         raise FDFDError('Unsupported type') |         raise Exception('Unsupported type') | ||||||
| 
 | 
 | ||||||
|     return types[float_type] |     return types[float_type] | ||||||
| 
 | 
 | ||||||
| 
 |  | ||||||
| # Type names | # Type names | ||||||
| ctype = type_to_C(numpy.complex128) | ctype = type_to_C(numpy.complex128) | ||||||
| ctype_bare = 'cdouble' | ctype_bare = 'cdouble' | ||||||
| 
 | 
 | ||||||
| # Preamble for all OpenCL code | # Preamble for all OpenCL code | ||||||
| preamble = f''' | preamble = ''' | ||||||
| #define PYOPENCL_DEFINE_CDOUBLE | #define PYOPENCL_DEFINE_CDOUBLE | ||||||
| #include <pyopencl-complex.h> | #include <pyopencl-complex.h> | ||||||
| 
 | 
 | ||||||
| //Defines to clean up operation and type names | //Defines to clean up operation and type names | ||||||
| #define ctype {ctype_bare}_t | #define ctype {ctype}_t | ||||||
| #define zero {ctype_bare}_new(0.0, 0.0) | #define zero {ctype}_new(0.0, 0.0) | ||||||
| #define add {ctype_bare}_add | #define add {ctype}_add | ||||||
| #define sub {ctype_bare}_sub | #define sub {ctype}_sub | ||||||
| #define mul {ctype_bare}_mul | #define mul {ctype}_mul | ||||||
| ''' | '''.format(ctype=ctype_bare) | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def ptrs(*args: str) -> list[str]: | def ptrs(*args: str) -> List[str]: | ||||||
|     return [ctype + ' *' + s for s in args] |     return [ctype + ' *' + s for s in args] | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def create_a( | def create_a(context: pyopencl.Context, | ||||||
|         context: pyopencl.Context, |              shape: numpy.ndarray, | ||||||
|         shape: ArrayLike, |              mu: bool = False, | ||||||
|         mu: bool = False, |              pec: bool = False, | ||||||
|         pec: bool = False, |              pmc: bool = False, | ||||||
|         pmc: bool = False, |              ) -> operation: | ||||||
|         ) -> operation: |  | ||||||
|     """ |     """ | ||||||
|     Return a function which performs (A @ p), where A is the FDFD wave equation for E-field. |     Return a function which performs (A @ p), where A is the FDFD wave equation for E-field. | ||||||
| 
 | 
 | ||||||
| @ -111,15 +91,12 @@ def create_a( | |||||||
| 
 | 
 | ||||||
|      and returns a list of pyopencl.Event. |      and returns a list of pyopencl.Event. | ||||||
| 
 | 
 | ||||||
|     Args: |     :param context: PyOpenCL context | ||||||
|         context: PyOpenCL context |     :param shape: Dimensions of the E-field | ||||||
|         shape: Dimensions of the E-field |     :param mu: False iff (mu == 1) everywhere | ||||||
|         mu: False iff (mu == 1) everywhere |     :param pec: False iff no PEC anywhere | ||||||
|         pec: False iff no PEC anywhere |     :param pmc: False iff no PMC anywhere | ||||||
|         pmc: False iff no PMC anywhere |     :return: Function for computing (A @ p) | ||||||
| 
 |  | ||||||
|     Returns: |  | ||||||
|         Function for computing (A @ p) |  | ||||||
|     """ |     """ | ||||||
| 
 | 
 | ||||||
|     common_source = jinja_env.get_template('common.cl').render(shape=shape) |     common_source = jinja_env.get_template('common.cl').render(shape=shape) | ||||||
| @ -129,72 +106,45 @@ def create_a( | |||||||
|     des = [ctype + ' *inv_de' + a for a in 'xyz'] |     des = [ctype + ' *inv_de' + a for a in 'xyz'] | ||||||
|     dhs = [ctype + ' *inv_dh' + a for a in 'xyz'] |     dhs = [ctype + ' *inv_dh' + a for a in 'xyz'] | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Convert p to initial E (ie, apply right preconditioner and PEC) |     Convert p to initial E (ie, apply right preconditioner and PEC) | ||||||
|     # |     ''' | ||||||
|     p2e_source = jinja_env.get_template('p2e.cl').render(pec=pec) |     p2e_source = jinja_env.get_template('p2e.cl').render(pec=pec) | ||||||
|     P2E_kernel = ElementwiseKernel( |     P2E_kernel = ElementwiseKernel(context, | ||||||
|         context, |                                    name='P2E', | ||||||
|         name='P2E', |                                    preamble=preamble, | ||||||
|         preamble=preamble, |                                    operation=p2e_source, | ||||||
|         operation=p2e_source, |                                    arguments=', '.join(ptrs('E', 'p', 'Pr') + pec_arg)) | ||||||
|         arguments=', '.join(ptrs('E', 'p', 'Pr') + pec_arg), |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Calculate intermediate H from intermediate E |     Calculate intermediate H from intermediate E | ||||||
|     # |     ''' | ||||||
|     e2h_source = jinja_env.get_template('e2h.cl').render( |     e2h_source = jinja_env.get_template('e2h.cl').render(mu=mu, | ||||||
|         mu=mu, |                                                          pmc=pmc, | ||||||
|         pmc=pmc, |                                                          common_cl=common_source) | ||||||
|         common_cl=common_source, |     E2H_kernel = ElementwiseKernel(context, | ||||||
|         ) |                                    name='E2H', | ||||||
|     E2H_kernel = ElementwiseKernel( |                                    preamble=preamble, | ||||||
|         context, |                                    operation=e2h_source, | ||||||
|         name='E2H', |                                    arguments=', '.join(ptrs('E', 'H', 'inv_mu') + pmc_arg + des)) | ||||||
|         preamble=preamble, |  | ||||||
|         operation=e2h_source, |  | ||||||
|         arguments=', '.join(ptrs('E', 'H', 'inv_mu') + pmc_arg + des), |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     # |     ''' | ||||||
|     # Calculate final E (including left preconditioner) |     Calculate final E (including left preconditioner) | ||||||
|     # |     ''' | ||||||
|     h2e_source = jinja_env.get_template('h2e.cl').render( |     h2e_source = jinja_env.get_template('h2e.cl').render(pec=pec, | ||||||
|         pec=pec, |                                                          common_cl=common_source) | ||||||
|         common_cl=common_source, |     H2E_kernel = ElementwiseKernel(context, | ||||||
|         ) |                                    name='H2E', | ||||||
|     H2E_kernel = ElementwiseKernel( |                                    preamble=preamble, | ||||||
|         context, |                                    operation=h2e_source, | ||||||
|         name='H2E', |                                    arguments=', '.join(ptrs('E', 'H', 'oeps', 'Pl') + pec_arg + dhs)) | ||||||
|         preamble=preamble, |  | ||||||
|         operation=h2e_source, |  | ||||||
|         arguments=', '.join(ptrs('E', 'H', 'oeps', 'Pl') + pec_arg + dhs), |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     def spmv( |     def spmv(E, H, p, idxes, oeps, inv_mu, pec, pmc, Pl, Pr, e): | ||||||
|             E: pyopencl.array.Array, |  | ||||||
|             H: pyopencl.array.Array, |  | ||||||
|             p: pyopencl.array.Array, |  | ||||||
|             idxes: Sequence[Sequence[pyopencl.array.Array]], |  | ||||||
|             oeps: pyopencl.array.Array, |  | ||||||
|             inv_mu: pyopencl.array.Array | None, |  | ||||||
|             pec: pyopencl.array.Array | None, |  | ||||||
|             pmc: pyopencl.array.Array | None, |  | ||||||
|             Pl: pyopencl.array.Array, |  | ||||||
|             Pr: pyopencl.array.Array, |  | ||||||
|             e: list[pyopencl.Event], |  | ||||||
|             ) -> list[pyopencl.Event]: |  | ||||||
|         e2 = P2E_kernel(E, p, Pr, pec, wait_for=e) |         e2 = P2E_kernel(E, p, Pr, pec, wait_for=e) | ||||||
|         e2 = E2H_kernel(E, H, inv_mu, pmc, *idxes[0], wait_for=[e2]) |         e2 = E2H_kernel(E, H, inv_mu, pmc, *idxes[0], wait_for=[e2]) | ||||||
|         e2 = H2E_kernel(E, H, oeps, Pl, pec, *idxes[1], wait_for=[e2]) |         e2 = H2E_kernel(E, H, oeps, Pl, pec, *idxes[1], wait_for=[e2]) | ||||||
|         return [e2] |         return [e2] | ||||||
| 
 | 
 | ||||||
|     logger.debug(f'Preamble: \n{preamble}') |  | ||||||
|     logger.debug(f'p2e: \n{p2e_source}') |  | ||||||
|     logger.debug(f'e2h: \n{e2h_source}') |  | ||||||
|     logger.debug(f'h2e: \n{h2e_source}') |  | ||||||
| 
 |  | ||||||
|     return spmv |     return spmv | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| @ -209,11 +159,8 @@ def create_xr_step(context: pyopencl.Context) -> operation: | |||||||
|     after waiting for all in the list e |     after waiting for all in the list e | ||||||
|     and returns a list of pyopencl.Event |     and returns a list of pyopencl.Event | ||||||
| 
 | 
 | ||||||
|     Args: |     :param context: PyOpenCL context | ||||||
|         context: PyOpenCL context |     :return: Function for performing x and r updates | ||||||
| 
 |  | ||||||
|     Returns: |  | ||||||
|         Function for performing x and r updates |  | ||||||
|     """ |     """ | ||||||
|     update_xr_source = ''' |     update_xr_source = ''' | ||||||
|     x[i] = add(x[i], mul(alpha, p[i])); |     x[i] = add(x[i], mul(alpha, p[i])); | ||||||
| @ -222,28 +169,19 @@ def create_xr_step(context: pyopencl.Context) -> operation: | |||||||
| 
 | 
 | ||||||
|     xr_args = ', '.join(ptrs('x', 'p', 'r', 'v') + [ctype + ' alpha']) |     xr_args = ', '.join(ptrs('x', 'p', 'r', 'v') + [ctype + ' alpha']) | ||||||
| 
 | 
 | ||||||
|     xr_kernel = ElementwiseKernel( |     xr_kernel = ElementwiseKernel(context, | ||||||
|         context, |                                   name='XR', | ||||||
|         name='XR', |                                   preamble=preamble, | ||||||
|         preamble=preamble, |                                   operation=update_xr_source, | ||||||
|         operation=update_xr_source, |                                   arguments=xr_args) | ||||||
|         arguments=xr_args, |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     def xr_update( |     def xr_update(x, p, r, v, alpha, e): | ||||||
|             x: pyopencl.array.Array, |  | ||||||
|             p: pyopencl.array.Array, |  | ||||||
|             r: pyopencl.array.Array, |  | ||||||
|             v: pyopencl.array.Array, |  | ||||||
|             alpha: complex, |  | ||||||
|             e: list[pyopencl.Event], |  | ||||||
|             ) -> list[pyopencl.Event]: |  | ||||||
|         return [xr_kernel(x, p, r, v, alpha, wait_for=e)] |         return [xr_kernel(x, p, r, v, alpha, wait_for=e)] | ||||||
| 
 | 
 | ||||||
|     return xr_update |     return xr_update | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def create_rhoerr_step(context: pyopencl.Context) -> Callable[..., tuple[complex, complex]]: | def create_rhoerr_step(context: pyopencl.Context) -> operation: | ||||||
|     """ |     """ | ||||||
|     Return a function |     Return a function | ||||||
|      ri_update(r, e) |      ri_update(r, e) | ||||||
| @ -254,11 +192,8 @@ def create_rhoerr_step(context: pyopencl.Context) -> Callable[..., tuple[complex | |||||||
|     after waiting for all pyopencl.Event in the list e |     after waiting for all pyopencl.Event in the list e | ||||||
|     and returns a list of pyopencl.Event |     and returns a list of pyopencl.Event | ||||||
| 
 | 
 | ||||||
|     Args: |     :param context: PyOpenCL context | ||||||
|         context: PyOpenCL context |     :return: Function for performing x and r updates | ||||||
| 
 |  | ||||||
|     Returns: |  | ||||||
|         Function for performing x and r updates |  | ||||||
|     """ |     """ | ||||||
| 
 | 
 | ||||||
|     update_ri_source = ''' |     update_ri_source = ''' | ||||||
| @ -270,20 +205,18 @@ def create_rhoerr_step(context: pyopencl.Context) -> Callable[..., tuple[complex | |||||||
|     # Use a vector type (double3) to make the reduction simpler |     # Use a vector type (double3) to make the reduction simpler | ||||||
|     ri_dtype = pyopencl.array.vec.double3 |     ri_dtype = pyopencl.array.vec.double3 | ||||||
| 
 | 
 | ||||||
|     ri_kernel = ReductionKernel( |     ri_kernel = ReductionKernel(context, | ||||||
|         context, |                                 name='RHOERR', | ||||||
|         name='RHOERR', |                                 preamble=preamble, | ||||||
|         preamble=preamble, |                                 dtype_out=ri_dtype, | ||||||
|         dtype_out=ri_dtype, |                                 neutral='(double3)(0.0, 0.0, 0.0)', | ||||||
|         neutral='(double3)(0.0, 0.0, 0.0)', |                                 map_expr=update_ri_source, | ||||||
|         map_expr=update_ri_source, |                                 reduce_expr='a+b', | ||||||
|         reduce_expr='a+b', |                                 arguments=ctype + ' *r') | ||||||
|         arguments=ctype + ' *r', |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     def ri_update(r: pyopencl.array.Array, e: list[pyopencl.Event]) -> tuple[complex, complex]: |     def ri_update(r, e): | ||||||
|         g = ri_kernel(r, wait_for=e).astype(ri_dtype).get() |         g = ri_kernel(r, wait_for=e).astype(ri_dtype).get() | ||||||
|         rr, ri, ii = (g[qq] for qq in 'xyz') |         rr, ri, ii = [g[q] for q in 'xyz'] | ||||||
|         rho = rr + 2j * ri - ii |         rho = rr + 2j * ri - ii | ||||||
|         err = rr + ii |         err = rr + ii | ||||||
|         return rho, err |         return rho, err | ||||||
| @ -301,66 +234,48 @@ def create_p_step(context: pyopencl.Context) -> operation: | |||||||
|     after waiting for all pyopencl.Event in the list e |     after waiting for all pyopencl.Event in the list e | ||||||
|     and returns a list of pyopencl.Event |     and returns a list of pyopencl.Event | ||||||
| 
 | 
 | ||||||
|     Args: |     :param context: PyOpenCL context | ||||||
|         context: PyOpenCL context |     :return: Function for performing the p update | ||||||
| 
 |  | ||||||
|     Returns: |  | ||||||
|         Function for performing the p update |  | ||||||
|     """ |     """ | ||||||
|     update_p_source = ''' |     update_p_source = ''' | ||||||
|     p[i] = add(r[i], mul(beta, p[i])); |     p[i] = add(r[i], mul(beta, p[i])); | ||||||
|     ''' |     ''' | ||||||
|     p_args = ptrs('p', 'r') + [ctype + ' beta'] |     p_args = ptrs('p', 'r') + [ctype + ' beta'] | ||||||
| 
 | 
 | ||||||
|     p_kernel = ElementwiseKernel( |     p_kernel = ElementwiseKernel(context, | ||||||
|         context, |                                  name='P', | ||||||
|         name='P', |                                  preamble=preamble, | ||||||
|         preamble=preamble, |                                  operation=update_p_source, | ||||||
|         operation=update_p_source, |                                  arguments=', '.join(p_args)) | ||||||
|         arguments=', '.join(p_args), |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     def p_update( |     def p_update(p, r, beta, e): | ||||||
|             p: pyopencl.array.Array, |  | ||||||
|             r: pyopencl.array.Array, |  | ||||||
|             beta: complex, |  | ||||||
|             e: list[pyopencl.Event]) -> list[pyopencl.Event]: |  | ||||||
|         return [p_kernel(p, r, beta, wait_for=e)] |         return [p_kernel(p, r, beta, wait_for=e)] | ||||||
| 
 | 
 | ||||||
|     return p_update |     return p_update | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| def create_dot(context: pyopencl.Context) -> Callable[..., complex]: | def create_dot(context: pyopencl.Context) -> operation: | ||||||
|     """ |     """ | ||||||
|     Return a function for performing the dot product |     Return a function for performing the dot product | ||||||
|      p @ v |      p @ v | ||||||
|     with the signature |     with the signature | ||||||
|      dot(p, v, e) -> complex |      dot(p, v, e) -> float | ||||||
| 
 | 
 | ||||||
|     Args: |     :param context: PyOpenCL context | ||||||
|         context: PyOpenCL context |     :return: Function for performing the dot product | ||||||
| 
 |  | ||||||
|     Returns: |  | ||||||
|         Function for performing the dot product |  | ||||||
|     """ |     """ | ||||||
|     dot_dtype = numpy.complex128 |     dot_dtype = numpy.complex128 | ||||||
| 
 | 
 | ||||||
|     dot_kernel = ReductionKernel( |     dot_kernel = ReductionKernel(context, | ||||||
|         context, |                                  name='dot', | ||||||
|         name='dot', |                                  preamble=preamble, | ||||||
|         preamble=preamble, |                                  dtype_out=dot_dtype, | ||||||
|         dtype_out=dot_dtype, |                                  neutral='zero', | ||||||
|         neutral='zero', |                                  map_expr='mul(p[i], v[i])', | ||||||
|         map_expr='mul(p[i], v[i])', |                                  reduce_expr='add(a, b)', | ||||||
|         reduce_expr='add(a, b)', |                                  arguments=ptrs('p', 'v')) | ||||||
|         arguments=ptrs('p', 'v'), |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     def dot( |     def dot(p, v, e): | ||||||
|             p: pyopencl.array.Array, |  | ||||||
|             v: pyopencl.array.Array, |  | ||||||
|             e: list[pyopencl.Event], |  | ||||||
|             ) -> complex: |  | ||||||
|         g = dot_kernel(p, v, wait_for=e) |         g = dot_kernel(p, v, wait_for=e) | ||||||
|         return g.get() |         return g.get() | ||||||
| 
 | 
 | ||||||
| @ -381,11 +296,8 @@ def create_a_csr(context: pyopencl.Context) -> operation: | |||||||
|     The function waits on all the pyopencl.Event in e before running, and returns |     The function waits on all the pyopencl.Event in e before running, and returns | ||||||
|      a list of pyopencl.Event. |      a list of pyopencl.Event. | ||||||
| 
 | 
 | ||||||
|     Args: |     :param context: PyOpenCL context | ||||||
|         context: PyOpenCL context |     :return: Function for sparse (M @ v) operation where M is in CSR format | ||||||
| 
 |  | ||||||
|     Returns: |  | ||||||
|         Function for sparse (M @ v) operation where M is in CSR format |  | ||||||
|     """ |     """ | ||||||
|     spmv_source = ''' |     spmv_source = ''' | ||||||
|     int start = m_row_ptr[i]; |     int start = m_row_ptr[i]; | ||||||
| @ -406,20 +318,13 @@ def create_a_csr(context: pyopencl.Context) -> operation: | |||||||
|     m_args = 'int *m_row_ptr, int *m_col_ind, ' + ctype + ' *m_data' |     m_args = 'int *m_row_ptr, int *m_col_ind, ' + ctype + ' *m_data' | ||||||
|     v_in_args = ctype + ' *v_in' |     v_in_args = ctype + ' *v_in' | ||||||
| 
 | 
 | ||||||
|     spmv_kernel = ElementwiseKernel( |     spmv_kernel = ElementwiseKernel(context, | ||||||
|         context, |                                     name='csr_spmv', | ||||||
|         name='csr_spmv', |                                     preamble=preamble, | ||||||
|         preamble=preamble, |                                     operation=spmv_source, | ||||||
|         operation=spmv_source, |                                     arguments=', '.join((v_out_args, m_args, v_in_args))) | ||||||
|         arguments=', '.join((v_out_args, m_args, v_in_args)), |  | ||||||
|         ) |  | ||||||
| 
 | 
 | ||||||
|     def spmv( |     def spmv(v_out, m, v_in, e): | ||||||
|             v_out: pyopencl.array.Array, |  | ||||||
|             m: CSRMatrix, |  | ||||||
|             v_in: pyopencl.array.Array, |  | ||||||
|             e: list[pyopencl.Event], |  | ||||||
|             ) -> list[pyopencl.Event]: |  | ||||||
|         return [spmv_kernel(v_out, m.row_ptr, m.col_ind, m.data, v_in, wait_for=e)] |         return [spmv_kernel(v_out, m.row_ptr, m.col_ind, m.data, v_in, wait_for=e)] | ||||||
| 
 | 
 | ||||||
|     return spmv |     return spmv | ||||||
|  | |||||||
| @ -1,96 +0,0 @@ | |||||||
| [build-system] |  | ||||||
| requires = ["hatchling"] |  | ||||||
| build-backend = "hatchling.build" |  | ||||||
| 
 |  | ||||||
| [project] |  | ||||||
| name = "opencl_fdfd" |  | ||||||
| description = "OpenCL FDFD solver" |  | ||||||
| readme = "README.md" |  | ||||||
| license = { file = "LICENSE.md" } |  | ||||||
| authors = [ |  | ||||||
|     { name="Jan Petykiewicz", email="jan@mpxd.net" }, |  | ||||||
|     ] |  | ||||||
| homepage = "https://mpxd.net/code/jan/opencl_fdfd" |  | ||||||
| repository = "https://mpxd.net/code/jan/opencl_fdfd" |  | ||||||
| keywords = [ |  | ||||||
|     "FDFD", |  | ||||||
|     "finite", |  | ||||||
|     "difference", |  | ||||||
|     "frequency", |  | ||||||
|     "domain", |  | ||||||
|     "simulation", |  | ||||||
|     "optics", |  | ||||||
|     "electromagnetic", |  | ||||||
|     "dielectric", |  | ||||||
|     "PML", |  | ||||||
|     "solver", |  | ||||||
|     "FDTD", |  | ||||||
|     ] |  | ||||||
| classifiers = [ |  | ||||||
|     "Programming Language :: Python :: 3", |  | ||||||
|     "Development Status :: 4 - Beta", |  | ||||||
|     "Intended Audience :: Developers", |  | ||||||
|     "Intended Audience :: Manufacturing", |  | ||||||
|     "Intended Audience :: Science/Research", |  | ||||||
|     "License :: OSI Approved :: GNU Affero General Public License v3", |  | ||||||
|     "Topic :: Scientific/Engineering", |  | ||||||
|     ] |  | ||||||
| requires-python = ">=3.11" |  | ||||||
| dynamic = ["version"] |  | ||||||
| dependencies = [ |  | ||||||
|     "numpy>=1.26", |  | ||||||
|     "pyopencl", |  | ||||||
|     "jinja2", |  | ||||||
|     "meanas>=0.5", |  | ||||||
|     ] |  | ||||||
| 
 |  | ||||||
| [tool.hatch.version] |  | ||||||
| path = "opencl_fdfd/__init__.py" |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| [tool.ruff] |  | ||||||
| exclude = [ |  | ||||||
|     ".git", |  | ||||||
|     "dist", |  | ||||||
|     ] |  | ||||||
| line-length = 145 |  | ||||||
| indent-width = 4 |  | ||||||
| lint.dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$" |  | ||||||
| lint.select = [ |  | ||||||
|     "NPY", "E", "F", "W", "B", "ANN", "UP", "SLOT", "SIM", "LOG", |  | ||||||
|     "C4", "ISC", "PIE", "PT", "RET", "TCH", "PTH", "INT", |  | ||||||
|     "ARG", "PL", "R", "TRY", |  | ||||||
|     "G010", "G101", "G201", "G202", |  | ||||||
|     "Q002", "Q003", "Q004", |  | ||||||
|     ] |  | ||||||
| lint.ignore = [ |  | ||||||
|     #"ANN001",   # No annotation |  | ||||||
|     "ANN002",   # *args |  | ||||||
|     "ANN003",   # **kwargs |  | ||||||
|     "ANN401",   # Any |  | ||||||
|     "ANN101",   # self: Self |  | ||||||
|     "SIM108",   # single-line if / else assignment |  | ||||||
|     "RET504",   # x=y+z; return x |  | ||||||
|     "PIE790",   # unnecessary pass |  | ||||||
|     "ISC003",   # non-implicit string concatenation |  | ||||||
|     "C408",     # dict(x=y) instead of {'x': y} |  | ||||||
|     "PLR09",    # Too many xxx |  | ||||||
|     "PLR2004",  # magic number |  | ||||||
|     "PLC0414",  # import x as x |  | ||||||
|     "TRY003",   # Long exception message |  | ||||||
|     ] |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| [[tool.mypy.overrides]] |  | ||||||
| module = [ |  | ||||||
|     "scipy", |  | ||||||
|     "scipy.optimize", |  | ||||||
|     "scipy.linalg", |  | ||||||
|     "scipy.sparse", |  | ||||||
|     "scipy.sparse.linalg", |  | ||||||
|     "pyopencl", |  | ||||||
|     "pyopencl.array", |  | ||||||
|     "pyopencl.elementwise", |  | ||||||
|     "pyopencl.reduction", |  | ||||||
|     ] |  | ||||||
| ignore_missing_imports = true |  | ||||||
							
								
								
									
										24
									
								
								setup.py
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										24
									
								
								setup.py
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,24 @@ | |||||||
|  | #!/usr/bin/env python | ||||||
|  | 
 | ||||||
|  | from setuptools import setup, find_packages | ||||||
|  | 
 | ||||||
|  | setup(name='opencl_fdfd', | ||||||
|  |       version='0.3', | ||||||
|  |       description='OpenCL FDFD solver', | ||||||
|  |       author='Jan Petykiewicz', | ||||||
|  |       author_email='anewusername@gmail.com', | ||||||
|  |       url='https://mpxd.net/gogs/jan/opencl_fdfd', | ||||||
|  |       packages=find_packages(), | ||||||
|  |       package_data={ | ||||||
|  |           'opencl_fdfd': ['kernels/*'] | ||||||
|  |       }, | ||||||
|  |       install_requires=[ | ||||||
|  |             'numpy', | ||||||
|  |             'pyopencl', | ||||||
|  |             'jinja2', | ||||||
|  |             'fdfd_tools>=0.3', | ||||||
|  |       ], | ||||||
|  |       extras_require={ | ||||||
|  |       }, | ||||||
|  |       ) | ||||||
|  | 
 | ||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user