Compare commits

..

21 Commits

Author SHA1 Message Date
8f294d8cc8 bump dependency versions 2024-07-30 22:44:02 -07:00
32b5063019 add ruff and mypy configs 2024-07-30 22:43:53 -07:00
c3646b2fd2 use a custom exception 2024-07-30 22:43:35 -07:00
d72c5e254f misc cleanup 2024-07-30 22:43:29 -07:00
9282bfe8c0 use f-string 2024-07-30 22:42:58 -07:00
684557d479 use asarray() in place of array(copy=False) 2024-07-30 22:42:49 -07:00
6193a9c256 improve type annotations 2024-07-30 22:41:27 -07:00
2f7a46ff71 "import x as x" for re-exported names 2024-07-30 22:38:57 -07:00
jan
c42ce49983 add half-precision to type_to_C 2022-11-24 22:33:10 -08:00
b8ea859106 bump version to 0.4 2022-11-20 21:58:48 -08:00
3d8054ba40 use f-strings everywhere 2022-11-20 21:57:54 -08:00
efeb29479b improve type annotations, formatting, comment styles 2022-11-20 21:57:43 -08:00
81bb1dd2c0 add caches to .gitignore 2022-11-20 19:38:34 -08:00
e41a71ce6f move to hatch-based build 2022-11-20 19:38:21 -08:00
cba31bf081 use new email 2021-07-11 17:11:19 -07:00
77f53affe7 use VERSION.py instead of importing package before it's installed 2021-07-11 17:11:17 -07:00
0ebfa030c4 README fixes 2021-07-11 17:10:38 -07:00
5861767a00 depend on meanas instad of fdfd_tools 2021-07-11 17:09:55 -07:00
jan
792b161753 avoid importing the package before its installed... 2020-07-03 13:48:47 -07:00
jan
66c30e6eab move from fdfd_tools to meanas 2020-07-03 13:48:24 -07:00
jan
2130b015fd readme updates 2020-07-03 13:46:38 -07:00
11 changed files with 472 additions and 294 deletions

3
.gitignore vendored
View File

@ -60,3 +60,6 @@ target/
# PyCharm # PyCharm
.idea/ .idea/
.mypy_cache/
.pytest_cache/

View File

@ -6,10 +6,10 @@ electromagnetic solver implemented in Python and OpenCL.
**Capabilities:** **Capabilities:**
* Arbitrary distributions of the following: * Arbitrary distributions of the following:
* Dielectric constant (```epsilon```) * Dielectric constant (`epsilon`)
* Magnetic permeabilty (```mu```) * Magnetic permeabilty (`mu`)
* Perfect electric conductor (```PEC```) * Perfect electric conductor (`PEC`)
* Perfect magnetic conductor (```PMC```) * Perfect magnetic conductor (`PMC`)
* Variable-sized rectangular grids * Variable-sized rectangular grids
* Stretched-coordinate PMLs (complex cell sizes allowed) * Stretched-coordinate PMLs (complex cell sizes allowed)
@ -17,10 +17,10 @@ Currently, only periodic boundary conditions are included.
PEC/PMC boundaries can be implemented by drawing PEC/PMC cells near the edges. PEC/PMC boundaries can be implemented by drawing PEC/PMC cells near the edges.
Bloch boundary conditions are not included but wouldn't be very hard to add. Bloch boundary conditions are not included but wouldn't be very hard to add.
The default solver ```opencl_fdfd.cg_solver(...)``` located in main.py The default solver `opencl_fdfd.cg_solver(...)` located in main.py
implements the E-field wave operator directly (ie, as a list of OpenCL implements the E-field wave operator directly (ie, as a list of OpenCL
instructions rather than a matrix). Additionally, there is a slower instructions rather than a matrix). Additionally, there is a slower
(and slightly more versatile) solver in ```csr.py``` which attempts to solve (and slightly more versatile) solver in `csr.py` which attempts to solve
an arbitrary sparse matrix in compressed sparse row (CSR) format using an arbitrary sparse matrix in compressed sparse row (CSR) format using
the same conjugate gradient method as the default solver. The CSR solver the same conjugate gradient method as the default solver. The CSR solver
is significantly slower, but can be very useful for testing alternative is significantly slower, but can be very useful for testing alternative
@ -34,11 +34,11 @@ generalization to multiple GPUs should be pretty straightforward
## Installation ## Installation
**Dependencies:** **Dependencies:**
* python 3 (written and tested with 3.5) * python 3 (written and tested with 3.7)
* numpy * numpy
* pyopencl * pyopencl
* jinja2 * jinja2
* [fdfd_tools](https://mpxd.net/code/jan/fdfd_tools) (>=0.2) * [meanas](https://mpxd.net/code/jan/meanas) (>=0.5)
Install with pip, via git: Install with pip, via git:
@ -49,14 +49,14 @@ pip install git+https://mpxd.net/code/jan/opencl_fdfd.git@release
## Use ## Use
See the documentation for ```opencl_fdfd.cg_solver(...)``` See the documentation for `opencl_fdfd.cg_solver(...)`
(located in ```main.py```) for details about how to call the solver. (located in ```main.py```) for details about how to call the solver.
The FDFD arguments are identical to those in The FDFD arguments are identical to those in
```fdfd_tools.solvers.generic(...)```, and a few solver-specific `meanas.solvers.generic(...)`, and a few solver-specific
arguments are available. arguments are available.
An alternate (slower) FDFD solver and a general gpu-based sparse matrix An alternate (slower) FDFD solver and a general gpu-based sparse matrix
solver is available in ```csr.py```. These aren't particularly solver is available in `csr.py`. These aren't particularly
well-optimized, and something like well-optimized, and something like
[MAGMA](http://icl.cs.utk.edu/magma/index.html) would probably be a [MAGMA](http://icl.cs.utk.edu/magma/index.html) would probably be a
better choice if you absolutely need to solve arbitrary sparse matrices better choice if you absolutely need to solve arbitrary sparse matrices

1
opencl_fdfd/LICENSE.md Symbolic link
View File

@ -0,0 +1 @@
../LICENSE.md

1
opencl_fdfd/README.md Symbolic link
View File

@ -0,0 +1 @@
../README.md

View File

@ -31,14 +31,14 @@
Dependencies: Dependencies:
- fdfd_tools ( https://mpxd.net/code/jan/fdfd_tools ) - meanas ( https://mpxd.net/code/jan/meanas )
- numpy - numpy
- pyopencl - pyopencl
- jinja2 - jinja2
""" """
from .main import cg_solver from .main import cg_solver as cg_solver
__author__ = 'Jan Petykiewicz' __author__ = 'Jan Petykiewicz'
__version__ = '0.4'
version = '0.3' version = __version__

View File

@ -6,7 +6,7 @@ CSRMatrix sparse matrix representation.
The FDFD solver (fdfd_cg_solver()) solves an FDFD problem by The FDFD solver (fdfd_cg_solver()) solves an FDFD problem by
creating a sparse matrix representing the problem (using creating a sparse matrix representing the problem (using
fdfd_tools) and then passing it to cg(), which performs a meanas) and then passing it to cg(), which performs a
conjugate gradient solve. conjugate gradient solve.
cg() is capable of solving arbitrary sparse matrices which cg() is capable of solving arbitrary sparse matrices which
@ -14,58 +14,66 @@ satisfy the constraints for the 'conjugate gradient' algorithm
(positive definite, symmetric) and some that don't. (positive definite, symmetric) and some that don't.
""" """
from typing import Dict, Any from typing import Any, TYPE_CHECKING
import time import time
import logging import logging
import numpy import numpy
from numpy.typing import NDArray, ArrayLike
from numpy.linalg import norm from numpy.linalg import norm
from numpy import complexfloating
import pyopencl import pyopencl
import pyopencl.array import pyopencl.array
import meanas.fdfd.solvers
import fdfd_tools.solvers
from . import ops from . import ops
if TYPE_CHECKING:
import scipy
__author__ = 'Jan Petykiewicz'
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
class CSRMatrix(object): class CSRMatrix:
""" """
Matrix stored in Compressed Sparse Row format, in GPU RAM. Matrix stored in Compressed Sparse Row format, in GPU RAM.
""" """
row_ptr = None # type: pyopencl.array.Array row_ptr: pyopencl.array.Array
col_ind = None # type: pyopencl.array.Array col_ind: pyopencl.array.Array
data = None # type: pyopencl.array.Array data: pyopencl.array.Array
def __init__(self, def __init__(
self,
queue: pyopencl.CommandQueue, queue: pyopencl.CommandQueue,
m: 'scipy.sparse.csr_matrix'): m: 'scipy.sparse.csr_matrix',
) -> None:
self.row_ptr = pyopencl.array.to_device(queue, m.indptr) self.row_ptr = pyopencl.array.to_device(queue, m.indptr)
self.col_ind = pyopencl.array.to_device(queue, m.indices) self.col_ind = pyopencl.array.to_device(queue, m.indices)
self.data = pyopencl.array.to_device(queue, m.data.astype(numpy.complex128)) self.data = pyopencl.array.to_device(queue, m.data.astype(numpy.complex128))
def cg(A: 'scipy.sparse.csr_matrix', def cg(
b: numpy.ndarray, A: 'scipy.sparse.csr_matrix',
b: ArrayLike,
max_iters: int = 10000, max_iters: int = 10000,
err_threshold: float = 1e-6, err_threshold: float = 1e-6,
context: pyopencl.Context = None, context: pyopencl.Context | None = None,
queue: pyopencl.CommandQueue = None, queue: pyopencl.CommandQueue | None = None,
) -> numpy.ndarray: ) -> NDArray[complexfloating]:
""" """
General conjugate-gradient solver for sparse matrices, where A @ x = b. General conjugate-gradient solver for sparse matrices, where A @ x = b.
:param A: Matrix to solve (CSR format) Args:
:param b: Right-hand side vector (dense ndarray) A: Matrix to solve (CSR format)
:param max_iters: Maximum number of iterations b: Right-hand side vector (dense ndarray)
:param err_threshold: Error threshold for successful solve, relative to norm(b) max_iters: Maximum number of iterations
:param context: PyOpenCL context. Will be created if not given. err_threshold: Error threshold for successful solve, relative to norm(b)
:param queue: PyOpenCL command queue. Will be created if not given. context: PyOpenCL context. Will be created if not given.
:return: Solution vector x; returned even if solve doesn't converge. queue: PyOpenCL command queue. Will be created if not given.
Returns:
Solution vector x; returned even if solve doesn't converge.
""" """
start_time = time.perf_counter() start_time = time.perf_counter()
@ -76,10 +84,10 @@ def cg(A: 'scipy.sparse.csr_matrix',
if queue is None: if queue is None:
queue = pyopencl.CommandQueue(context) queue = pyopencl.CommandQueue(context)
def load_field(v, dtype=numpy.complex128): def load_field(v: NDArray[numpy.complexfloating], dtype: type = numpy.complex128) -> pyopencl.array.Array:
return pyopencl.array.to_device(queue, v.astype(dtype)) return pyopencl.array.to_device(queue, v.astype(dtype))
r = load_field(b) r = load_field(numpy.asarray(b))
x = pyopencl.array.zeros_like(r) x = pyopencl.array.zeros_like(r)
v = pyopencl.array.zeros_like(r) v = pyopencl.array.zeros_like(r)
p = pyopencl.array.zeros_like(r) p = pyopencl.array.zeros_like(r)
@ -90,27 +98,27 @@ def cg(A: 'scipy.sparse.csr_matrix',
m = CSRMatrix(queue, A) m = CSRMatrix(queue, A)
''' #
Generate OpenCL kernels # Generate OpenCL kernels
''' #
a_step = ops.create_a_csr(context) a_step = ops.create_a_csr(context)
xr_step = ops.create_xr_step(context) xr_step = ops.create_xr_step(context)
rhoerr_step = ops.create_rhoerr_step(context) rhoerr_step = ops.create_rhoerr_step(context)
p_step = ops.create_p_step(context) p_step = ops.create_p_step(context)
dot = ops.create_dot(context) dot = ops.create_dot(context)
''' #
Start the solve # Start the solve
''' #
start_time2 = time.perf_counter() start_time2 = time.perf_counter()
_, err2 = rhoerr_step(r, []) _, err2 = rhoerr_step(r, [])
b_norm = numpy.sqrt(err2) b_norm = numpy.sqrt(err2)
logging.debug('b_norm check: ', b_norm) logging.debug(f'b_norm check: {b_norm}')
success = False success = False
for k in range(max_iters): for k in range(max_iters):
logging.debug('[{:06d}] rho {:.4} alpha {:4.4}'.format(k, rho, alpha)) logging.debug(f'[{k:06d}] rho {rho:.4} alpha {alpha:4.4}')
rho_prev = rho rho_prev = rho
e = xr_step(x, p, r, v, alpha, []) e = xr_step(x, p, r, v, alpha, [])
@ -118,7 +126,7 @@ def cg(A: 'scipy.sparse.csr_matrix',
errs += [numpy.sqrt(err2) / b_norm] errs += [numpy.sqrt(err2) / b_norm]
logging.debug('err {}'.format(errs[-1])) logging.debug(f'err {errs[-1]}')
if errs[-1] < err_threshold: if errs[-1] < err_threshold:
success = True success = True
@ -128,12 +136,12 @@ def cg(A: 'scipy.sparse.csr_matrix',
e = a_step(v, m, p, e) e = a_step(v, m, p, e)
alpha = rho / dot(p, v, e) alpha = rho / dot(p, v, e)
if verbose and k % 1000 == 0: if k % 1000 == 0:
logging.info('iteration {}'.format(k)) logger.info(f'iteration {k}')
''' #
Done solving # Done solving
''' #
time_elapsed = time.perf_counter() - start_time time_elapsed = time.perf_counter() - start_time
x = x.get() x = x.get()
@ -142,38 +150,46 @@ def cg(A: 'scipy.sparse.csr_matrix',
logging.info('Solve success') logging.info('Solve success')
else: else:
logging.warning('Solve failure') logging.warning('Solve failure')
logging.info('{} iterations in {} sec: {} iterations/sec \ logging.info(f'{k} iterations in {time_elapsed} sec: {k / time_elapsed} iterations/sec')
'.format(k, time_elapsed, k / time_elapsed)) logging.debug(f'final error {errs[-1]}')
logging.debug('final error {}'.format(errs[-1])) logging.debug(f'overhead {start_time2 - start_time} sec')
logging.debug('overhead {} sec'.format(start_time2 - start_time))
logging.info('Final residual: {}'.format(norm(A @ x - b) / norm(b))) residual = norm(A @ x - b) / norm(b)
logging.info(f'Final residual: {residual}')
return x return x
def fdfd_cg_solver(solver_opts: Dict[str, Any] = None, def fdfd_cg_solver(
**fdfd_args solver_opts: dict[str, Any] | None = None,
) -> numpy.ndarray: **fdfd_args,
) -> NDArray[complexfloating]:
""" """
Conjugate gradient FDFD solver using CSR sparse matrices, mainly for Conjugate gradient FDFD solver using CSR sparse matrices, mainly for
testing and development since it's much slower than the solver in main.py. testing and development since it's much slower than the solver in main.py.
Calls fdfd_tools.solvers.generic(**fdfd_args, Calls meanas.fdfd.solvers.generic(
**fdfd_args,
matrix_solver=opencl_fdfd.csr.cg, matrix_solver=opencl_fdfd.csr.cg,
matrix_solver_opts=solver_opts) matrix_solver_opts=solver_opts,
)
:param solver_opts: Passed as matrix_solver_opts to fdfd_tools.solver.generic(...). Args:
solver_opts: Passed as matrix_solver_opts to fdfd_tools.solver.generic(...).
Default {}. Default {}.
:param fdfd_args: Passed as **fdfd_args to fdfd_tools.solver.generic(...). fdfd_args: Passed as **fdfd_args to fdfd_tools.solver.generic(...).
Should include all of the arguments **except** matrix_solver and matrix_solver_opts Should include all of the arguments **except** matrix_solver and matrix_solver_opts
:return: E-field which solves the system.
Returns:
E-field which solves the system.
""" """
if solver_opts is None: if solver_opts is None:
solver_opts = dict() solver_opts = dict()
x = fdfd_tools.solvers.generic(matrix_solver=cg, x = meanas.fdfd.solvers.generic(
matrix_solver=cg,
matrix_solver_opts=solver_opts, matrix_solver_opts=solver_opts,
**fdfd_args) **fdfd_args,
)
return x return x

View File

@ -5,69 +5,70 @@ This file holds the default FDFD solver, which uses an E-field wave
operator implemented directly as OpenCL arithmetic (rather than as operator implemented directly as OpenCL arithmetic (rather than as
a matrix). a matrix).
""" """
from typing import List
import time import time
import logging import logging
import numpy import numpy
from numpy.typing import NDArray, ArrayLike
from numpy.linalg import norm from numpy.linalg import norm
from numpy import floating, complexfloating
import pyopencl import pyopencl
import pyopencl.array import pyopencl.array
import fdfd_tools.operators import meanas.fdfd.operators
from . import ops from . import ops
__author__ = 'Jan Petykiewicz'
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def cg_solver(omega: complex, def cg_solver(
dxes: List[List[numpy.ndarray]], omega: complex,
J: numpy.ndarray, dxes: list[list[NDArray[floating | complexfloating]]],
epsilon: numpy.ndarray, J: ArrayLike,
mu: numpy.ndarray = None, epsilon: ArrayLike,
pec: numpy.ndarray = None, mu: ArrayLike | None = None,
pmc: numpy.ndarray = None, pec: ArrayLike | None = None,
pmc: ArrayLike | None = None,
adjoint: bool = False, adjoint: bool = False,
max_iters: int = 40000, max_iters: int = 40000,
err_threshold: float = 1e-6, err_threshold: float = 1e-6,
context: pyopencl.Context = None, context: pyopencl.Context | None = None,
) -> numpy.ndarray: ) -> NDArray:
""" """
OpenCL FDFD solver using the iterative conjugate gradient (cg) method OpenCL FDFD solver using the iterative conjugate gradient (cg) method
and implementing the diagonalized E-field wave operator directly in and implementing the diagonalized E-field wave operator directly in
OpenCL. OpenCL.
All ndarray arguments should be 1D arrays. To linearize a list of 3 3D ndarrays, All ndarray arguments should be 1D arrays. To linearize a list of 3 3D ndarrays,
either use fdfd_tools.vec() or numpy: either use meanas.fdmath.vec() or numpy:
f_1D = numpy.hstack(tuple((fi.flatten(order='F') for fi in [f_x, f_y, f_z]))) f_1D = numpy.hstack(tuple((fi.flatten(order='F') for fi in [f_x, f_y, f_z])))
:param omega: Complex frequency to solve at. Args:
:param dxes: [[dx_e, dy_e, dz_e], [dx_h, dy_h, dz_h]] (complex cell sizes) omega: Complex frequency to solve at.
:param J: Electric current distribution (at E-field locations) dxes: [[dx_e, dy_e, dz_e], [dx_h, dy_h, dz_h]] (complex cell sizes)
:param epsilon: Dielectric constant distribution (at E-field locations) J: Electric current distribution (at E-field locations)
:param mu: Magnetic permeability distribution (at H-field locations) epsilon: Dielectric constant distribution (at E-field locations)
:param pec: Perfect electric conductor distribution mu: Magnetic permeability distribution (at H-field locations)
pec: Perfect electric conductor distribution
(at E-field locations; non-zero value indicates PEC is present) (at E-field locations; non-zero value indicates PEC is present)
:param pmc: Perfect magnetic conductor distribution pmc: Perfect magnetic conductor distribution
(at H-field locations; non-zero value indicates PMC is present) (at H-field locations; non-zero value indicates PMC is present)
:param adjoint: If true, solves the adjoint problem. adjoint: If true, solves the adjoint problem.
:param max_iters: Maximum number of iterations. Default 40,000. max_iters: Maximum number of iterations. Default 40,000.
:param err_threshold: If (r @ r.conj()) / norm(1j * omega * J) < err_threshold, success. err_threshold: If (r @ r.conj()) / norm(1j * omega * J) < err_threshold, success.
Default 1e-6. Default 1e-6.
:param context: PyOpenCL context to run in. If not given, construct a new context. context: PyOpenCL context to run in. If not given, construct a new context.
:return: E-field which solves the system. Returned even if we did not converge.
"""
Returns:
E-field which solves the system. Returned even if we did not converge.
"""
start_time = time.perf_counter() start_time = time.perf_counter()
b = -1j * omega * J shape = [dd.size for dd in dxes[0]]
shape = [d.size for d in dxes[0]] b = -1j * omega * numpy.asarray(J)
''' '''
** In this comment, I use the following notation: ** In this comment, I use the following notation:
@ -96,30 +97,29 @@ def cg_solver(omega: complex,
We can accomplish all this simply by conjugating everything (except J) and We can accomplish all this simply by conjugating everything (except J) and
reversing the order of L and R reversing the order of L and R
''' '''
epsilon = numpy.asarray(epsilon)
if adjoint: if adjoint:
# Conjugate everything # Conjugate everything
dxes = [[numpy.conj(d) for d in dd] for dd in dxes] dxes = [[numpy.conj(dd) for dd in dds] for dds in dxes]
omega = numpy.conj(omega) omega = numpy.conj(omega)
epsilon = numpy.conj(epsilon) epsilon = numpy.conj(epsilon)
if mu is not None: if mu is not None:
mu = numpy.conj(mu) mu = numpy.conj(mu)
assert isinstance(epsilon, NDArray[floating] | NDArray[complexfloating])
L, R = fdfd_tools.operators.e_full_preconditioners(dxes) L, R = meanas.fdfd.operators.e_full_preconditioners(dxes)
b_preconditioned = (R if adjoint else L) @ b
if adjoint: #
b_preconditioned = R @ b # Allocate GPU memory and load in data
else: #
b_preconditioned = L @ b
'''
Allocate GPU memory and load in data
'''
if context is None: if context is None:
context = pyopencl.create_some_context(interactive=True) context = pyopencl.create_some_context(interactive=True)
queue = pyopencl.CommandQueue(context) queue = pyopencl.CommandQueue(context)
def load_field(v, dtype=numpy.complex128): def load_field(v: NDArray[complexfloating | floating], dtype: type = numpy.complex128) -> pyopencl.array.Array:
return pyopencl.array.to_device(queue, v.astype(dtype)) return pyopencl.array.to_device(queue, v.astype(dtype))
r = load_field(b_preconditioned) # load preconditioned b into r r = load_field(b_preconditioned) # load preconditioned b into r
@ -132,30 +132,31 @@ def cg_solver(omega: complex,
rho = 1.0 + 0j rho = 1.0 + 0j
errs = [] errs = []
inv_dxes = [[load_field(1 / d) for d in dd] for dd in dxes] inv_dxes = [[load_field(1 / numpy.asarray(dd)) for dd in dds] for dds in dxes]
oeps = load_field(-omega ** 2 * epsilon) oeps = load_field(-omega * omega * epsilon)
Pl = load_field(L.diagonal()) Pl = load_field(L.diagonal())
Pr = load_field(R.diagonal()) Pr = load_field(R.diagonal())
if mu is None: if mu is None:
invm = load_field(numpy.array([])) invm = load_field(numpy.array([]))
else: else:
invm = load_field(1 / mu) invm = load_field(1 / numpy.asarray(mu))
mu = numpy.asarray(mu)
if pec is None: if pec is None:
gpec = load_field(numpy.array([]), dtype=numpy.int8) gpec = load_field(numpy.array([]), dtype=numpy.int8)
else: else:
gpec = load_field(pec.astype(bool), dtype=numpy.int8) gpec = load_field(numpy.asarray(pec, dtype=bool), dtype=numpy.int8)
if pmc is None: if pmc is None:
gpmc = load_field(numpy.array([]), dtype=numpy.int8) gpmc = load_field(numpy.array([]), dtype=numpy.int8)
else: else:
gpmc = load_field(pmc.astype(bool), dtype=numpy.int8) gpmc = load_field(numpy.asarray(pmc, dtype=bool), dtype=numpy.int8)
''' #
Generate OpenCL kernels # Generate OpenCL kernels
''' #
has_mu, has_pec, has_pmc = [q is not None for q in (mu, pec, pmc)] has_mu, has_pec, has_pmc = (qq is not None for qq in (mu, pec, pmc))
a_step_full = ops.create_a(context, shape, has_mu, has_pec, has_pmc) a_step_full = ops.create_a(context, shape, has_mu, has_pec, has_pmc)
xr_step = ops.create_xr_step(context) xr_step = ops.create_xr_step(context)
@ -163,23 +164,28 @@ def cg_solver(omega: complex,
p_step = ops.create_p_step(context) p_step = ops.create_p_step(context)
dot = ops.create_dot(context) dot = ops.create_dot(context)
def a_step(E, H, p, events): def a_step(
E: pyopencl.array.Array,
H: pyopencl.array.Array,
p: pyopencl.array.Array,
events: list[pyopencl.Event],
) -> list[pyopencl.Event]:
return a_step_full(E, H, p, inv_dxes, oeps, invm, gpec, gpmc, Pl, Pr, events) return a_step_full(E, H, p, inv_dxes, oeps, invm, gpec, gpmc, Pl, Pr, events)
''' #
Start the solve # Start the solve
''' #
start_time2 = time.perf_counter() start_time2 = time.perf_counter()
_, err2 = rhoerr_step(r, []) _, err2 = rhoerr_step(r, [])
b_norm = numpy.sqrt(err2) b_norm = numpy.sqrt(err2)
logging.debug('b_norm check: {}'.format(b_norm)) logging.debug(f'b_norm check: {b_norm}')
success = False success = False
for k in range(max_iters): for k in range(max_iters):
do_print = (k % 100 == 0) do_print = (k % 100 == 0)
if do_print: if do_print:
logger.debug('[{:06d}] rho {:.4} alpha {:4.4}'.format(k, rho, alpha)) logger.debug(f'[{k:06d}] rho {rho:.4} alpha {alpha:4.4}')
rho_prev = rho rho_prev = rho
e = xr_step(x, p, r, v, alpha, []) e = xr_step(x, p, r, v, alpha, [])
@ -188,7 +194,7 @@ def cg_solver(omega: complex,
errs += [numpy.sqrt(err2) / b_norm] errs += [numpy.sqrt(err2) / b_norm]
if do_print: if do_print:
logger.debug('err {}'.format(errs[-1])) logger.debug(f'err {errs[-1]}')
if errs[-1] < err_threshold: if errs[-1] < err_threshold:
success = True success = True
@ -199,32 +205,30 @@ def cg_solver(omega: complex,
alpha = rho / dot(p, v, e) alpha = rho / dot(p, v, e)
if k % 1000 == 0: if k % 1000 == 0:
logger.info('iteration {}'.format(k)) logger.info(f'iteration {k}')
''' #
Done solving # Done solving
''' #
time_elapsed = time.perf_counter() - start_time time_elapsed = time.perf_counter() - start_time
# Undo preconditioners # Undo preconditioners
if adjoint: x = ((Pl if adjoint else Pr) * x).get()
x = (Pl * x).get()
else:
x = (Pr * x).get()
if success: if success:
logger.info('Solve success') logger.info('Solve success')
else: else:
logger.warning('Solve failure') logger.warning('Solve failure')
logger.info('{} iterations in {} sec: {} iterations/sec \ logger.info(f'{k} iterations in {time_elapsed} sec: {k / time_elapsed} iterations/sec')
'.format(k, time_elapsed, k / time_elapsed)) logger.debug(f'final error {errs[-1]}')
logger.debug('final error {}'.format(errs[-1])) logger.debug(f'overhead {start_time2 - start_time} sec')
logger.debug('overhead {} sec'.format(start_time2 - start_time))
A0 = fdfd_tools.operators.e_full(omega, dxes, epsilon, mu).tocsr() A0 = meanas.fdfd.operators.e_full(omega, dxes, epsilon, mu).tocsr()
if adjoint: if adjoint:
# Remember we conjugated all the contents of A earlier # Remember we conjugated all the contents of A earlier
A0 = A0.T A0 = A0.T
logger.info('Post-everything residual: {}'.format(norm(A0 @ x - b) / norm(b)))
residual = norm(A0 @ x - b) / norm(b)
logger.info(f'Post-everything residual: {residual}')
return x return x

View File

@ -7,10 +7,11 @@ kernels for use by the other solvers.
See kernels/ for any of the .cl files loaded in this file. See kernels/ for any of the .cl files loaded in this file.
""" """
from typing import List, Callable from collections.abc import Callable, Sequence
import logging import logging
import numpy import numpy
from numpy.typing import ArrayLike
import jinja2 import jinja2
import pyopencl import pyopencl
@ -19,57 +20,73 @@ from pyopencl.elementwise import ElementwiseKernel
from pyopencl.reduction import ReductionKernel from pyopencl.reduction import ReductionKernel
from .csr import CSRMatrix
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
class FDFDError(Exception):
""" Custom error for opencl_fdfd """
pass
# Create jinja2 env on module load # Create jinja2 env on module load
jinja_env = jinja2.Environment(loader=jinja2.PackageLoader(__name__, 'kernels')) jinja_env = jinja2.Environment(loader=jinja2.PackageLoader(__name__, 'kernels'))
# Return type for the create_opname(...) functions # Return type for the create_opname(...) functions
operation = Callable[..., List[pyopencl.Event]] operation = Callable[..., list[pyopencl.Event]]
def type_to_C(float_type: numpy.float32 or numpy.float64) -> str: def type_to_C(
float_type: type[numpy.floating | numpy.complexfloating],
) -> str:
""" """
Returns a string corresponding to the C equivalent of a numpy type. Returns a string corresponding to the C equivalent of a numpy type.
:param float_type: numpy type: float32, float64, complex64, complex128 Args:
:return: string containing the corresponding C type (eg. 'double') float_type: numpy type: float32, float64, complex64, complex128
Returns:
string containing the corresponding C type (eg. 'double')
""" """
types = { types = {
numpy.float16: 'half',
numpy.float32: 'float', numpy.float32: 'float',
numpy.float64: 'double', numpy.float64: 'double',
numpy.complex64: 'cfloat_t', numpy.complex64: 'cfloat_t',
numpy.complex128: 'cdouble_t', numpy.complex128: 'cdouble_t',
} }
if float_type not in types: if float_type not in types:
raise Exception('Unsupported type') raise FDFDError('Unsupported type')
return types[float_type] return types[float_type]
# Type names # Type names
ctype = type_to_C(numpy.complex128) ctype = type_to_C(numpy.complex128)
ctype_bare = 'cdouble' ctype_bare = 'cdouble'
# Preamble for all OpenCL code # Preamble for all OpenCL code
preamble = ''' preamble = f'''
#define PYOPENCL_DEFINE_CDOUBLE #define PYOPENCL_DEFINE_CDOUBLE
#include <pyopencl-complex.h> #include <pyopencl-complex.h>
//Defines to clean up operation and type names //Defines to clean up operation and type names
#define ctype {ctype}_t #define ctype {ctype_bare}_t
#define zero {ctype}_new(0.0, 0.0) #define zero {ctype_bare}_new(0.0, 0.0)
#define add {ctype}_add #define add {ctype_bare}_add
#define sub {ctype}_sub #define sub {ctype_bare}_sub
#define mul {ctype}_mul #define mul {ctype_bare}_mul
'''.format(ctype=ctype_bare) '''
def ptrs(*args: str) -> List[str]: def ptrs(*args: str) -> list[str]:
return [ctype + ' *' + s for s in args] return [ctype + ' *' + s for s in args]
def create_a(context: pyopencl.Context, def create_a(
shape: numpy.ndarray, context: pyopencl.Context,
shape: ArrayLike,
mu: bool = False, mu: bool = False,
pec: bool = False, pec: bool = False,
pmc: bool = False, pmc: bool = False,
@ -94,12 +111,15 @@ def create_a(context: pyopencl.Context,
and returns a list of pyopencl.Event. and returns a list of pyopencl.Event.
:param context: PyOpenCL context Args:
:param shape: Dimensions of the E-field context: PyOpenCL context
:param mu: False iff (mu == 1) everywhere shape: Dimensions of the E-field
:param pec: False iff no PEC anywhere mu: False iff (mu == 1) everywhere
:param pmc: False iff no PMC anywhere pec: False iff no PEC anywhere
:return: Function for computing (A @ p) pmc: False iff no PMC anywhere
Returns:
Function for computing (A @ p)
""" """
common_source = jinja_env.get_template('common.cl').render(shape=shape) common_source = jinja_env.get_template('common.cl').render(shape=shape)
@ -109,49 +129,71 @@ def create_a(context: pyopencl.Context,
des = [ctype + ' *inv_de' + a for a in 'xyz'] des = [ctype + ' *inv_de' + a for a in 'xyz']
dhs = [ctype + ' *inv_dh' + a for a in 'xyz'] dhs = [ctype + ' *inv_dh' + a for a in 'xyz']
''' #
Convert p to initial E (ie, apply right preconditioner and PEC) # Convert p to initial E (ie, apply right preconditioner and PEC)
''' #
p2e_source = jinja_env.get_template('p2e.cl').render(pec=pec) p2e_source = jinja_env.get_template('p2e.cl').render(pec=pec)
P2E_kernel = ElementwiseKernel(context, P2E_kernel = ElementwiseKernel(
context,
name='P2E', name='P2E',
preamble=preamble, preamble=preamble,
operation=p2e_source, operation=p2e_source,
arguments=', '.join(ptrs('E', 'p', 'Pr') + pec_arg)) arguments=', '.join(ptrs('E', 'p', 'Pr') + pec_arg),
)
''' #
Calculate intermediate H from intermediate E # Calculate intermediate H from intermediate E
''' #
e2h_source = jinja_env.get_template('e2h.cl').render(mu=mu, e2h_source = jinja_env.get_template('e2h.cl').render(
mu=mu,
pmc=pmc, pmc=pmc,
common_cl=common_source) common_cl=common_source,
E2H_kernel = ElementwiseKernel(context, )
E2H_kernel = ElementwiseKernel(
context,
name='E2H', name='E2H',
preamble=preamble, preamble=preamble,
operation=e2h_source, operation=e2h_source,
arguments=', '.join(ptrs('E', 'H', 'inv_mu') + pmc_arg + des)) arguments=', '.join(ptrs('E', 'H', 'inv_mu') + pmc_arg + des),
)
''' #
Calculate final E (including left preconditioner) # Calculate final E (including left preconditioner)
''' #
h2e_source = jinja_env.get_template('h2e.cl').render(pec=pec, h2e_source = jinja_env.get_template('h2e.cl').render(
common_cl=common_source) pec=pec,
H2E_kernel = ElementwiseKernel(context, common_cl=common_source,
)
H2E_kernel = ElementwiseKernel(
context,
name='H2E', name='H2E',
preamble=preamble, preamble=preamble,
operation=h2e_source, operation=h2e_source,
arguments=', '.join(ptrs('E', 'H', 'oeps', 'Pl') + pec_arg + dhs)) arguments=', '.join(ptrs('E', 'H', 'oeps', 'Pl') + pec_arg + dhs),
)
def spmv(E, H, p, idxes, oeps, inv_mu, pec, pmc, Pl, Pr, e): def spmv(
E: pyopencl.array.Array,
H: pyopencl.array.Array,
p: pyopencl.array.Array,
idxes: Sequence[Sequence[pyopencl.array.Array]],
oeps: pyopencl.array.Array,
inv_mu: pyopencl.array.Array | None,
pec: pyopencl.array.Array | None,
pmc: pyopencl.array.Array | None,
Pl: pyopencl.array.Array,
Pr: pyopencl.array.Array,
e: list[pyopencl.Event],
) -> list[pyopencl.Event]:
e2 = P2E_kernel(E, p, Pr, pec, wait_for=e) e2 = P2E_kernel(E, p, Pr, pec, wait_for=e)
e2 = E2H_kernel(E, H, inv_mu, pmc, *idxes[0], wait_for=[e2]) e2 = E2H_kernel(E, H, inv_mu, pmc, *idxes[0], wait_for=[e2])
e2 = H2E_kernel(E, H, oeps, Pl, pec, *idxes[1], wait_for=[e2]) e2 = H2E_kernel(E, H, oeps, Pl, pec, *idxes[1], wait_for=[e2])
return [e2] return [e2]
logger.debug('Preamble: \n{}'.format(preamble)) logger.debug(f'Preamble: \n{preamble}')
logger.debug('p2e: \n{}'.format(p2e_source)) logger.debug(f'p2e: \n{p2e_source}')
logger.debug('e2h: \n{}'.format(e2h_source)) logger.debug(f'e2h: \n{e2h_source}')
logger.debug('h2e: \n{}'.format(h2e_source)) logger.debug(f'h2e: \n{h2e_source}')
return spmv return spmv
@ -167,8 +209,11 @@ def create_xr_step(context: pyopencl.Context) -> operation:
after waiting for all in the list e after waiting for all in the list e
and returns a list of pyopencl.Event and returns a list of pyopencl.Event
:param context: PyOpenCL context Args:
:return: Function for performing x and r updates context: PyOpenCL context
Returns:
Function for performing x and r updates
""" """
update_xr_source = ''' update_xr_source = '''
x[i] = add(x[i], mul(alpha, p[i])); x[i] = add(x[i], mul(alpha, p[i]));
@ -177,19 +222,28 @@ def create_xr_step(context: pyopencl.Context) -> operation:
xr_args = ', '.join(ptrs('x', 'p', 'r', 'v') + [ctype + ' alpha']) xr_args = ', '.join(ptrs('x', 'p', 'r', 'v') + [ctype + ' alpha'])
xr_kernel = ElementwiseKernel(context, xr_kernel = ElementwiseKernel(
context,
name='XR', name='XR',
preamble=preamble, preamble=preamble,
operation=update_xr_source, operation=update_xr_source,
arguments=xr_args) arguments=xr_args,
)
def xr_update(x, p, r, v, alpha, e): def xr_update(
x: pyopencl.array.Array,
p: pyopencl.array.Array,
r: pyopencl.array.Array,
v: pyopencl.array.Array,
alpha: complex,
e: list[pyopencl.Event],
) -> list[pyopencl.Event]:
return [xr_kernel(x, p, r, v, alpha, wait_for=e)] return [xr_kernel(x, p, r, v, alpha, wait_for=e)]
return xr_update return xr_update
def create_rhoerr_step(context: pyopencl.Context) -> operation: def create_rhoerr_step(context: pyopencl.Context) -> Callable[..., tuple[complex, complex]]:
""" """
Return a function Return a function
ri_update(r, e) ri_update(r, e)
@ -200,8 +254,11 @@ def create_rhoerr_step(context: pyopencl.Context) -> operation:
after waiting for all pyopencl.Event in the list e after waiting for all pyopencl.Event in the list e
and returns a list of pyopencl.Event and returns a list of pyopencl.Event
:param context: PyOpenCL context Args:
:return: Function for performing x and r updates context: PyOpenCL context
Returns:
Function for performing x and r updates
""" """
update_ri_source = ''' update_ri_source = '''
@ -213,18 +270,20 @@ def create_rhoerr_step(context: pyopencl.Context) -> operation:
# Use a vector type (double3) to make the reduction simpler # Use a vector type (double3) to make the reduction simpler
ri_dtype = pyopencl.array.vec.double3 ri_dtype = pyopencl.array.vec.double3
ri_kernel = ReductionKernel(context, ri_kernel = ReductionKernel(
context,
name='RHOERR', name='RHOERR',
preamble=preamble, preamble=preamble,
dtype_out=ri_dtype, dtype_out=ri_dtype,
neutral='(double3)(0.0, 0.0, 0.0)', neutral='(double3)(0.0, 0.0, 0.0)',
map_expr=update_ri_source, map_expr=update_ri_source,
reduce_expr='a+b', reduce_expr='a+b',
arguments=ctype + ' *r') arguments=ctype + ' *r',
)
def ri_update(r, e): def ri_update(r: pyopencl.array.Array, e: list[pyopencl.Event]) -> tuple[complex, complex]:
g = ri_kernel(r, wait_for=e).astype(ri_dtype).get() g = ri_kernel(r, wait_for=e).astype(ri_dtype).get()
rr, ri, ii = [g[q] for q in 'xyz'] rr, ri, ii = (g[qq] for qq in 'xyz')
rho = rr + 2j * ri - ii rho = rr + 2j * ri - ii
err = rr + ii err = rr + ii
return rho, err return rho, err
@ -242,48 +301,66 @@ def create_p_step(context: pyopencl.Context) -> operation:
after waiting for all pyopencl.Event in the list e after waiting for all pyopencl.Event in the list e
and returns a list of pyopencl.Event and returns a list of pyopencl.Event
:param context: PyOpenCL context Args:
:return: Function for performing the p update context: PyOpenCL context
Returns:
Function for performing the p update
""" """
update_p_source = ''' update_p_source = '''
p[i] = add(r[i], mul(beta, p[i])); p[i] = add(r[i], mul(beta, p[i]));
''' '''
p_args = ptrs('p', 'r') + [ctype + ' beta'] p_args = ptrs('p', 'r') + [ctype + ' beta']
p_kernel = ElementwiseKernel(context, p_kernel = ElementwiseKernel(
context,
name='P', name='P',
preamble=preamble, preamble=preamble,
operation=update_p_source, operation=update_p_source,
arguments=', '.join(p_args)) arguments=', '.join(p_args),
)
def p_update(p, r, beta, e): def p_update(
p: pyopencl.array.Array,
r: pyopencl.array.Array,
beta: complex,
e: list[pyopencl.Event]) -> list[pyopencl.Event]:
return [p_kernel(p, r, beta, wait_for=e)] return [p_kernel(p, r, beta, wait_for=e)]
return p_update return p_update
def create_dot(context: pyopencl.Context) -> operation: def create_dot(context: pyopencl.Context) -> Callable[..., complex]:
""" """
Return a function for performing the dot product Return a function for performing the dot product
p @ v p @ v
with the signature with the signature
dot(p, v, e) -> float dot(p, v, e) -> complex
:param context: PyOpenCL context Args:
:return: Function for performing the dot product context: PyOpenCL context
Returns:
Function for performing the dot product
""" """
dot_dtype = numpy.complex128 dot_dtype = numpy.complex128
dot_kernel = ReductionKernel(context, dot_kernel = ReductionKernel(
context,
name='dot', name='dot',
preamble=preamble, preamble=preamble,
dtype_out=dot_dtype, dtype_out=dot_dtype,
neutral='zero', neutral='zero',
map_expr='mul(p[i], v[i])', map_expr='mul(p[i], v[i])',
reduce_expr='add(a, b)', reduce_expr='add(a, b)',
arguments=ptrs('p', 'v')) arguments=ptrs('p', 'v'),
)
def dot(p, v, e): def dot(
p: pyopencl.array.Array,
v: pyopencl.array.Array,
e: list[pyopencl.Event],
) -> complex:
g = dot_kernel(p, v, wait_for=e) g = dot_kernel(p, v, wait_for=e)
return g.get() return g.get()
@ -304,8 +381,11 @@ def create_a_csr(context: pyopencl.Context) -> operation:
The function waits on all the pyopencl.Event in e before running, and returns The function waits on all the pyopencl.Event in e before running, and returns
a list of pyopencl.Event. a list of pyopencl.Event.
:param context: PyOpenCL context Args:
:return: Function for sparse (M @ v) operation where M is in CSR format context: PyOpenCL context
Returns:
Function for sparse (M @ v) operation where M is in CSR format
""" """
spmv_source = ''' spmv_source = '''
int start = m_row_ptr[i]; int start = m_row_ptr[i];
@ -326,13 +406,20 @@ def create_a_csr(context: pyopencl.Context) -> operation:
m_args = 'int *m_row_ptr, int *m_col_ind, ' + ctype + ' *m_data' m_args = 'int *m_row_ptr, int *m_col_ind, ' + ctype + ' *m_data'
v_in_args = ctype + ' *v_in' v_in_args = ctype + ' *v_in'
spmv_kernel = ElementwiseKernel(context, spmv_kernel = ElementwiseKernel(
context,
name='csr_spmv', name='csr_spmv',
preamble=preamble, preamble=preamble,
operation=spmv_source, operation=spmv_source,
arguments=', '.join((v_out_args, m_args, v_in_args))) arguments=', '.join((v_out_args, m_args, v_in_args)),
)
def spmv(v_out, m, v_in, e): def spmv(
v_out: pyopencl.array.Array,
m: CSRMatrix,
v_in: pyopencl.array.Array,
e: list[pyopencl.Event],
) -> list[pyopencl.Event]:
return [spmv_kernel(v_out, m.row_ptr, m.col_ind, m.data, v_in, wait_for=e)] return [spmv_kernel(v_out, m.row_ptr, m.col_ind, m.data, v_in, wait_for=e)]
return spmv return spmv

0
opencl_fdfd/py.typed Normal file
View File

96
pyproject.toml Normal file
View File

@ -0,0 +1,96 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "opencl_fdfd"
description = "OpenCL FDFD solver"
readme = "README.md"
license = { file = "LICENSE.md" }
authors = [
{ name="Jan Petykiewicz", email="jan@mpxd.net" },
]
homepage = "https://mpxd.net/code/jan/opencl_fdfd"
repository = "https://mpxd.net/code/jan/opencl_fdfd"
keywords = [
"FDFD",
"finite",
"difference",
"frequency",
"domain",
"simulation",
"optics",
"electromagnetic",
"dielectric",
"PML",
"solver",
"FDTD",
]
classifiers = [
"Programming Language :: Python :: 3",
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Intended Audience :: Manufacturing",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: GNU Affero General Public License v3",
"Topic :: Scientific/Engineering",
]
requires-python = ">=3.11"
dynamic = ["version"]
dependencies = [
"numpy>=1.26",
"pyopencl",
"jinja2",
"meanas>=0.5",
]
[tool.hatch.version]
path = "opencl_fdfd/__init__.py"
[tool.ruff]
exclude = [
".git",
"dist",
]
line-length = 145
indent-width = 4
lint.dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
lint.select = [
"NPY", "E", "F", "W", "B", "ANN", "UP", "SLOT", "SIM", "LOG",
"C4", "ISC", "PIE", "PT", "RET", "TCH", "PTH", "INT",
"ARG", "PL", "R", "TRY",
"G010", "G101", "G201", "G202",
"Q002", "Q003", "Q004",
]
lint.ignore = [
#"ANN001", # No annotation
"ANN002", # *args
"ANN003", # **kwargs
"ANN401", # Any
"ANN101", # self: Self
"SIM108", # single-line if / else assignment
"RET504", # x=y+z; return x
"PIE790", # unnecessary pass
"ISC003", # non-implicit string concatenation
"C408", # dict(x=y) instead of {'x': y}
"PLR09", # Too many xxx
"PLR2004", # magic number
"PLC0414", # import x as x
"TRY003", # Long exception message
]
[[tool.mypy.overrides]]
module = [
"scipy",
"scipy.optimize",
"scipy.linalg",
"scipy.sparse",
"scipy.sparse.linalg",
"pyopencl",
"pyopencl.array",
"pyopencl.elementwise",
"pyopencl.reduction",
]
ignore_missing_imports = true

View File

@ -1,30 +0,0 @@
#!/usr/bin/env python3
from setuptools import setup, find_packages
import opencl_fdfd
with open('README.md', 'r') as f:
long_description = f.read()
setup(name='opencl_fdfd',
version=opencl_fdfd.version,
description='OpenCL FDFD solver',
long_description=long_description,
long_description_content_type='text/markdown',
author='Jan Petykiewicz',
author_email='anewusername@gmail.com',
url='https://mpxd.net/code/jan/opencl_fdfd',
packages=find_packages(),
package_data={
'opencl_fdfd': ['kernels/*']
},
install_requires=[
'numpy',
'pyopencl',
'jinja2',
'fdfd_tools>=0.3',
],
extras_require={
},
)