Skip to content

tofino-related PCI error #173

@Nieuwejaar

Description

@Nieuwejaar

I found dpd hung on sled 16 on dublin. It seems to have been derailed by a PCI error during startup.

At the system level we see:

BRM23230018 # fmdump -v
TIME                 UUID                                 SUNW-MSG-ID EVENT
Dec 28 00:01:37.7986 1a7d366c-fe31-48df-b247-21cc33c4ec1c SUNOS-8000-J0 Diagnosed
   50%  defect.sunos.eft.unexpected_telemetry

        Problem in: dev:////pci@ab,0/pci1de,fff9@3,2
           Affects: -
               FRU: -
          Location: -

   50%  fault.sunos.eft.unexpected_telemetry

        Problem in: dev:////pci@ab,0/pci1de,fff9@3,2
           Affects: -
               FRU: -
          Location: -
...

At the same time in the dendrite log, we see:

00:01:37.560Z DEBG dpd: Set 4ns pulse config to 0xc30c30c
    module = Lld
    unit = bf-sde
00:01:37.560Z DEBG dpd: Set global ts inc config to 0xc30c30c
    module = Lld
    unit = bf-sde
00:01:37.560Z DEBG dpd: Set global PSC inc config to 0xaab
    module = Lld
    unit = bf-sde
00:01:39.925Z INFO dpd: bf_device_add dev id 0, is_sw_model 0
    module = Dvm
    unit = bf-sde

The 2 second gap between the last 2 messages is unusual and corresponds with the timing of the PCI error. Interestingly, the first time dendrite reoprts any PCI-related issues is nearly a minute later:

00:02:37.118Z INFO dpd: Entering pipe_mgr_config_complete, dev 0
    module = Pipe
    unit = bf-sde
00:02:37.181Z DEBG dpd: LLD: FAULT: DMA error: dev_id=0, d0=00039a83cc00014f, d1=0000010000000016
    module = Lld
    unit = bf-sde
00:02:37.181Z DEBG dpd: FAULT: 3 : 0000000000000000 : 00039a83cc00014f : 0000010000000016
    module = Lld
    unit = bf-sde

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions