Passing pywrapfst objects to C++ using Cython
OpenFst provides Python binding which can be imported by import pywrapfst
.
In this article, I’ll describe how to pass pywrapfst
objects (such as pywrapfst.MutableFst
) to C++ code using Cython.
Here’s the directory structure. I’ll describe each file content later.
$ tree
.
├── Dockerfile
├── fstsample.pyx
├── include
│ └── base
│ └── integral_types.h
├── setup.py
└── src
├── fstsample.cpp
└── fstsample.hpp
Prep
Build Docker image
I’ve prepared the Dockerfile to make it easy to try without installing OpenFst to my host machine.
OpenFst 1.8.0 requires gcc >= 7 so centos:centos8
is used as a base image.
FROM centos:centos8
RUN dnf update -y \
&& dnf group install -y "Development Tools" \
&& dnf install -y python36-devel \
&& dnf clean all \
&& rm -rf /var/cache/dnf/*
RUN pip3 install -U pip setuptools \
&& pip3 install cython \
&& rm -rf ~/.cache/pip
Build a Docker image and start a container.
$ docker build --no-cache -t openfst_builder .
$ docker run --rm -it -v ${PWD}:${PWD} -w ${PWD} openfst_builder /bin/bash
Check the versions.
# python3 --version
Python 3.6.8
# pip3 list
Package Version
--------------- -------
Cython 0.29.21
gpg 1.10.0
pip 20.2.4
python-dateutil 2.6.1
rpm 4.14.2
setuptools 50.3.2
six 1.11.0
Build OpenFst
Fetch OpenFst-1.8.0.tar.gz
, which is the latest version as of November 2020. OpenFst tarballs can be downloaded here.
# curl -LO http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.8.0.tar.gz
# tar zxvf openfst-1.8.0.tar.gz
Build and install OpenFst. Be sure to run configure
with --enable-python
option so that Python binding is also be built.
# pushd openfst-1.8.0
# ./configure --enable-python
# make -j4
# make install
# popd
Now binaries, headers and libraries are installed in /usr/local
. Also, Python binding is installed in /usr/local/lib/python3.6/site-packages
.
# tree /usr/local/lib/python3.6/
/usr/local/lib/python3.6
└── site-packages
├── pywrapfst.la
└── pywrapfst.so
So I can import pywrapfst
and use its functions.
# python3
>>> import pywrapfst
>>>
Compile an example FST
Follow the instruction.
$ cat >text.fst <<EOF
0 1 a x .5
0 1 b y 1.5
1 2 c z 2.5
2 3.5
EOF
$ cat >isyms.txt <<EOF
<eps> 0
a 1
b 2
c 3
EOF
$ cat >osyms.txt <<EOF
<eps> 0
x 1
y 2
z 3
EOF
$ fstcompile --isymbols=isyms.txt --osymbols=osyms.txt text.fst sample.fst
This FST can be read in Python using pywrapfst
.
# python3
>>> import pywrapfst
>>> fst = pywrapfst.Fst.read('sample.fst')
>>> type(fst)
<class 'pywrapfst.MutableFst'>
>>> print(fst)
0 1 1 1 0.5
0 1 2 2 1.5
1 2 3 3 2.5
2 3.5
Build C++ wrapper
I have following C++ files which provides a feature similar to fstinfo
command.
src/fstsample.hpp
#ifndef FSTSAMPLE_HPP_
#define FSTSAMPLE_HPP_
#include <fst/script/fst-class.h>
#include <memory>
namespace fstsample {
void PrintFstInfo(std::shared_ptr<fst::script::FstClass> fst);
} // namespace fstsample
#endif // FSTSAMPLE_HPP_
src/fstsample.cpp
#include "fstsample.hpp"
#include <fst/script/info.h>
#include <string>
namespace fstsample {
void PrintFstInfo(std::shared_ptr<fst::script::FstClass> fst) {
bool test_properties = true;
const std::string arc_filter = "any";
const std::string info_type = "auto";
bool verify = true;
fst::script::Info(*fst, test_properties, arc_filter,
info_type, verify);
}
} // namespace fstsample
Of course I can use PrintFstInfo
function from C++, but how can I use this in Python?
The answer is, to write C++ wrapper in Cython.
fstsample.pyx
# distutils: language = c++
# cython: language_level=3
cimport pywrapfst
from libcpp.memory cimport shared_ptr
cdef extern from "src/fstsample.hpp" namespace "fstsample":
void PrintFstInfo(shared_ptr[pywrapfst.fst.MutableFstClass])
def print_fst_info(pywrapfst.MutableFst fst):
PrintFstInfo(fst._mfst)
What this .pyx
file does is:
# distutils: language = c++
line shows that the.pyx
file must be compiled as C++.cimport pywrapfst
importspywrapfst.pxd
, which is insrc/extensions/python
directory of OpenFst.cdef extern from "src/fstsample.hpp"
exposes functions insrc/fstsample.hpp
to Cython. But that’s not enough, and I have to declare it byvoid PrintFstInfo(shared_ptr[pywrapfst.fst.MutableFstClass])])
so that Cython can use it.print_fst_info
is a function which can be called from Python. As you see above, the type of FST ispywrapfst.MutableFst
so the argument type is alsopywrapfst.MutableFst
.pywrapfst.MutableFst
has_mfst
member variable whose type ispywrapfst.fst.MutableFstClass
and it’s passed toPrintFstInfo()
.
setup.py
from distutils.core import Extension, setup
from Cython.Build import cythonize
extensions = [
Extension(
"fstsample",
sources=["fstsample.pyx", "src/fstsample.cpp"],
include_dirs=[
# to include 'base/integer_types.h'
"include",
# to include fst/extensions headers
"openfst-1.8.0/src/include",
],
libraries=["fstscript"],
extra_compile_args=["-std=c++17"]
)
]
setup(
name="fstsample",
ext_modules=cythonize(
extensions,
# to include *.pxd
include_path=["openfst-1.8.0/src/extensions/python"],
)
)
Here are some notes about setup.py:
pywrapfst.pxd
depends oncintegral_type.pxd
which depends onbase/integral_types.h
. It seems or-tools include this header so I’ve manually copy it toinclude/base/integral_types.h
and added"include"
toinclude_dirs
argument ofExtension
.- Since headers in
src/include/fst/extensions
are not installed bymake install
,"openfst-1.8.0/src/include"
is also added toinclude_dirs
argument. "openfst-1.8.0/src/extensions/python"
is added toinclude_path
argument ofcythonize
so that.pxd
files such aspywrapfst.pxd
can be included at compile time.
Then, build the wrapper. --build-lib
option specifies where to output the .so
file. Without this option, .so
will be in build/lib.linux-x86_64-3.6/
.
# python3 setup.py build_ext --build-lib .
# ls *.so
fstsample.cpython-36m-x86_64-linux-gnu.so
Finally, fstsample
module can be imported and print_fst_info()
function can be called from Python, which internally calls fstsample::PrintFstInfo
C++ function.
# python3
>>> import pywrapfst
>>> import fstsample
>>> fst = pywrapfst.Fst.read('sample.fst')
>>> fstsample.print_fst_info(fst)
fst type vector
arc type standard
input symbol table none
output symbol table none
# of states 3
# of arcs 3
initial state 0
# of final states 1
# of input/output epsilons 0
# of input epsilons 0
# of output epsilons 0
input label multiplicity 1
output label multiplicity 1
# of accessible states 3
# of coaccessible states 3
# of connected states 3
# of connected components 1
# of strongly conn components 3
input matcher y
output matcher y
input lookahead n
output lookahead n
expanded y
mutable y
error n
acceptor y
input deterministic y
output deterministic y
input/output epsilons n
input epsilons n
output epsilons n
input label sorted y
output label sorted y
weighted y
cyclic n
cyclic at initial state n
top sorted y
accessible y
coaccessible y
string n
weighted cycles n