Guru Meditation Error - Pycom WiPy
@oligauc Output is:
>>> os.uname() (sysname='LoPy4', nodename='LoPy4', release='1.18.1.r7', version='v1.8.6-849-d1c5ea9 on 2018-12-17', machine='LoPy4 with ESP32', lorawan='1.0.2', sigfox='1.0.1')
@milan Please can you post the output of:
import os os.uname()
@timh self.connect is defined in the MQTTClient class and is passed to the MsgHandler class using self._msgHandler=msgHandler.MsgHandler(self._recv_callback, self.connect)
Your connect function creates a socket, but it does not send the connect packet to the aws server
@timh When the handler sees a disconnection, _connect_helper() is called, which in turn calls connect in the mqtt client class.
The poll registration is done each time a new socket is created. Look at createSocketConnection
The firmware has an embedded garbage collector
... and one more! Had five Wipys running over the weekend, one died after roughly 37000 sent messages to AWS:
Guru Meditation Error: Core 1 panic'ed (Unknown reason)
And that was it. I'd really appreciate a fix here.
@timh been going over the AWS MQTT msgHandler further. I see a number of more bugs.
For instance if a the handler see's a disconnection, and (tries to reconnect - of course connect() is missing)
The poll registration on the original socket still exists. No poll registration is created when a new socket is created.
When I finish reviewing I will post separately a updated message handler.
Some of my earlier changes (above) don't make things worse but are incorrect/incomplete.
@milan The master branch at github https://github.com/pycom/pycom-micropython-sigfox always almost always reflects the latest build, which you can install with pyupgrade. The version number is in the file https://github.com/pycom/pycom-micropython-sigfox/blob/master/esp32/pycom_version.h.
I am not sure which commit to check out to get 1.18.1.r7. The log is not detailed in respect to that file.
@robert-hh Ok, but I can't find this release on https://github.com/pycom/pycom-micropython-sigfox/releases
Where could I found source for this release?
@milan You have to build you own firmware to get the .elf file. It will be in esp32/build/WIPY/release
@robert-hh Unfortunately I can't share code with you, sorry. Firmware version is v1.18.1.r7. Also, where can I find .elf file from this version, I couldn't find it on GitHub?
@timh I also suspect AWSIoT or threads. I'll look into your suggestions - thanks!
@martinnn I wonder if your guru meditation could be due to AWS MQTT lib (threads).
I found I had to make some fixes to that library. (I use the MQTTClient and MQTTMessageHandler classes to connect to a none AWS service that uses TLS.)
I found that there where problems,
for instance I found
self.connect()is called in
_verify_connection_state()but connect method isn't defined.
That in itself should cause a guru but the next problem could lead to it thread issues.
I found that if a disconnection occurs then underlying thread isn't being killed for the message handler.
In fact I think there is no mechanism in that code to explicitly disconnect reliably and re-connect as threads are left running over time. This causes a fault eventually.
I added connect and modified disconnect
def disconnect(self): if self._sock: self._sock.close() self._sock = None self._conn_state_mutex.acquire() self._exitRequest = True self._conn_state_mutex.release() def connect(self): self.disconnect() self._conn_state_mutex.acquire() self._exitRequest = False self._conn_state_mutex.release() self.createSocketConnection()
Added a flag in the handler to mark a request to close the thread and added a guard in the io_thread loop
if self._exitRequest: _thread.exit() raise mqttConst.MQTTDisconnected
This seems to have made the library more robust, at least for me.
There does still seem to be one other run away thread issue, but my WDT catches that one.
Same here. Guru meditation error on Wipy, latest release FW, sending stuff every 2s to AWS via the builtin AWSIoT library. Also using I2C and asynchronous communication. Happens rarely though, but a killer for a device deployed in the field (it's dead then until someone presses the reset button). We'll add a hardware watchdog for this.
@milan That ceratinly should not happen with Python. Do you have more information, like the piece of code that crashed, the firmware version.